All posts by Arthur Smith

Open access: fringe or mainstream?

October 23, 2020Uncategorizedgold open access, green open access, hybrid, open access, Plan S, policy, REF, repositoryArthur Smith

When I was just settling in to the world of open access and scholarly communication, I wrote about the need for open access to stop being a fringe activity and enter the mainstream of researcher behaviour:

“Open access needs to stop being a ‘fringe’ activity and become part of the mainstream. It shouldn’t be an afterthought to the publication process. Whether the solution to academic inaction is better systems or, as I believe, greater engagement and reward, I feel that the scholarly communications and repository community can look forward to many interesting developments over the coming months and years.”

While much has changed in the five years since I (somewhat naïvely) wrote those concluding thoughts, there are still significant barriers towards the complete opening of scholarly discourse. However, should open access be an afterthought for researchers? I’ve changed my mind. Open access should be something researchers don’t even need to think about, and I think that future is already here, though I fear it will ultimately sideline institutional repositories.

According to the 2020 Leiden Ranking, the median rate at which UK institutions make their research outputs open access is over 80%, which is far higher than any other nation (Figure 1). Indeed, the UK is the only country that has ‘levelled up’ over the last five years, while the rest of the world’s institutions have slowly plodded along making slow, but steady, progress.

Figure 1. The median institutional open access percentage for each country according to the Leiden Ranking. Note, these figures are medians of all institutions within a country. This does not mean that 80% of the UK’s publications are open access, but that the median rate of open access at UK institutions is 80%.

The main driver for this increase in open access content in the UK is through green open access (Figure 2), due in large part to the REF 2021 open access policy (announced in 2014 and effective from 2016). This is a dramatic demonstration of the influence that policy can have on researcher behaviour, which has made open access a mainstream activity in the UK.

Figure 2. The median institutional green open access percentage for each country according to the Leiden Ranking.

Like the rest of the UK, Cambridge has seen similar trends across all forms of open access (Figure 3), with rising use of green open access, and steadily increasing adoption of gold and hybrid. Yet despite all the money poured into gold and (more controversially) hybrid open access, the net effect of all this other activity is a measly 3% additional open access content (82% vs 79%). Which begs the question, was it worth it? If open access can be so successfully achieved through green routes, what is the inherent benefit of gold/hybrid open access?

Figure 3. Open access trends in Cambridge according to the Leiden Ranking. In the 2020 ranking, 79% was delivered through green open access. This means that despite all the work to facilitate other forms of open access, this activity only contributed an additional 3% to the total (82%).

Of course, Plan S has now emerged as the most significant attempt to coordinate a clear and coherent international strategy for open access. While it is not without its detractors, I am nonetheless supportive of cOAlition S’s overall aims. However, as the UK scholarly communication community has experienced, policy implementation is messy and can lead to unintended consequences. While Plan S provides options for complying through green open access routes, the discussions that institutions and publishers (both traditional and fully open access alike) have engaged in are almost entirely focussed on gold open access through transformative deals. This is not because we, as institutions, want to spend more on publishing, but rather it is the pragmatic approach to create open access content at the source and provide authors with easy and palatable routes to open access. It also is a recognition that flipping journals requires give and take from institutions and publishers alike.

We are now very close to reaching a point where open access can be an afterthought for researchers, particularly in the UK. In large part, it will be done for them through direct agreements between institutions and publishers. Cambridge already has open access publishing arrangements with over 5000 journals, and this figure will continue to grow as we sign more transformative agreements. However, this will ultimately be to the detriment of green open access. Instead of being the only open access source for a journal article, institutional repositories will instead become secondary storehouses of already gold open access content. The heyday of institutional repositories, if one ever existed, is now over.

For me, that is a sad thought. We have poured enormous resource and effort into maintaining Apollo, but we must recognise the burden that green open access places on researchers. They have better things to do. I expect that the next five years will see a dramatic increase in gold and hybrid open access content produced in the UK. Green open access won’t go away, but we will have entered a time where open access is no longer fringe, nor indeed mainstream, but rather de facto for all research.

Published 23 October 2020

Written by Dr Arthur Smith

Clearing the final hurdle – automating embargo setting

April 3, 2020Uncategorizedopen access, Research Excellence FrameworkArthur Smith

One of the biggest issues facing the Open Access Team has been keeping up with the constant stream of accepted manuscripts that need to be processed. In many cases we receive notification of an accepted manuscript well before formal publication. This has presented a significant challenge over the last five years because although we know there is a publication forthcoming (or at least we trust that there this), we have no idea as to when an article may actually be published.

This means that we have many thousands of publication records in Apollo which have ‘placeholder’ embargoes because we simply did not know the publication date at the point of archiving and therefore could not set an accurate embargo. After archiving, many of the records in Apollo may have been supplemented with a publication date thanks to metadata supplied via Symplectic Elements, but we still need to set an accurate embargo.

In other cases we might be waiting for an article to be published gold open access so that we can update Apollo with the published version of record.

While we are now very adept at archiving manuscripts in Apollo (thanks in large part to Fast Track and Orpheus) it remains a challenge to properly and accurately update Apollo records with either correct embargoes for accepted manuscripts, or the open access version of record. It is a futile task to be constantly checking whether a manuscript has been published. While the Open Access Team keeps a list of every publication that requires updating, this is a thankless job that should be highly automatable.

To that end, we have recently leveraged Orpheus to do at lot of the heavy lifting for us. By interrogating every journal article in Apollo and comparing its metadata against Orpheus we can now quickly determine which items can be updated and take the necessary next steps, changing embargoes where appropriate or identifying opportunities to archive the published version of record.

To do this we created a DSpace curation task to check every “Article” type in Apollo that had at least one file that was currently under embargo. We then compared the publication metadata against the information held in Orpheus to determine what steps needed to be taken. In total we found 9,164 items in need of some attention. The results are displayed below in a Tableau Public visual and summarised in Table 1.

Of these items, 3,864 had a published open access version archived alongside the embargoed manuscript, so we skipped any further updating of these records. This is actually a very good sign, and indicates that the Open Access Team has been going back to records and supplementing them with the open access version of record.

Amongst the remaining items, 2,794 were successfully matched against Orpheus and had their embargoes verified: 1,862 records were updated with shorter embargoes and 412 had longer embargoes applied, leaving 520 items which were unchanged because they already had the correct embargo period.

The final 2,506 items were primarily composed of records with no publication date (1,132 items), publications that could potentially be supplemented by the open access version of record (537 items) or had no embargo information in Orpheus (434 items).

Table 1. Summary of outcomes after comparing Apollo records against Orpheus.

Date archived in Apollo	2014	2015	2016	2017	2018	2019	2020	Total
The item has an open VoR version	7	105	1200	1019	1300	226	7	3864
Accepted version – embargo updated		2	145	76	132	2305	134	2794
No publication date available			10	159	32	714	217	1132
Orpheus VoR embargo: 0	1	4	51	18	5	451	7	537
No AAM embargo information available	3	6	64	39	33	264	25	434
Other outcome	8	37	114	47	23	162	12	403
Total	19	154	1584	1358	1525	4122	402	9164

We plan to run this curation task on a regular basis and periodically check the outcomes. Any items that continually fail to update will be processed manually by the Open Access Team, but our intention and desire is to move away from manual processing wherever possible.

Published 3 April 2020

Written by Dr Arthur Smith

A Fast-Track Route to Open Access

April 23, 2019Uncategorizedopen access, repositoryArthur Smith

In the last two years, since the REF 2021 open access policy came into force, the Open Access Team has received an ever increasing number of manuscript submissions for archiving in Apollo, Cambridge’s institutional open access repository.

We have been thinking long and hard about ways to cope with the workload, by scrutinising existing practices and streamlining workflows, because we want to provide the best possible service to our researchers, commensurate with the University’s world leading research.

This blog introduces what is perhaps the greatest overhaul of our workflows since the service began: a new ‘Fast Track’ deposit system.

Work it harder

Before the start of the REF OA policy (2014-2016), the Open Access Team would process and manually curate every manuscript submission we received. Authors could expect an initial response within 1-2 working days, after which (usually within a month) we would archive their manuscript in Apollo.

A simplified workflow for a typical manuscript was:

Manuscript uploaded by submitter in Symplectic Elements.
Item created in Apollo (DSpace) workflow
Helpdesk ticket created (Zendesk).
Open Access Team reviews manuscript, advises submitter and makes a decision.
Open Access Team archives the manuscript in Apollo and informs submitter.

Both the decision (4) and archive (5) steps take time. For each manuscript we would need to decide whether the files we received could be archived, what funder open access policies were at play and the open access options available from the publisher. We could then advise authors about their open access choices.

To archive a manuscript the process was broadly the following:

Review the helpdesk ticket (Zendesk) for the open access decision.
Enter as many publication details as possible in Symplectic Elements.
Retrieve the submission from the Apollo (DSpace) deposit workflow.
Add licence and metadata to the record.
Review the submission and approve for archiving.
Move the item to the relevant departmental collection and apply an appropriate embargo (if required).
Finally, update the helpdesk ticket and send the original submitter a link to their Apollo record.

Each manuscript took on average 18 minutes to archive, which, besides being manually tedious and prone to error, was extremely time-consuming. Add to this the time required to make the initial decision and each manuscript submission could easily take 30 minutes for the Open Access Team to fully process from start to finish, especially if an open access fee had to be paid.

Fast-forward two years and with the rate of new manuscript submissions now peaking at over 1,300 per month, simply processing manuscripts for the REF would require more than four full-time staff members. Whilst these manual processes were viable for a handful of submissions a day, they became unwieldy at scale.

Make it better

Our first attempt at speeding up our open access system began in August 2017. To start we made a number of operational changes to reduce the time spent processing manuscript submissions:

We would rely entirely on the metadata present in Symplectic Elements to populate the Apollo records (i.e. we would not curate manual records).
The Open Access Team would no longer update the helpdesk records, instead internal record keeping would be automated as much as possible.

Unfortunately, the number of steps in the Apollo workflow was still roughly the same as the previous process, but with one key difference: a new field to record what we call the ‘Fast Track’ decision. There were seven Fast Track options:

Submitted
Proof
Published (not open access)
Published (open access)
Accepted (published)
Accepted (not published)
Other

The first six options represent the vast bulk of all manuscripts received by the Open Access Team, and ‘Other’ option simply acts as a catch-all for anything else. By simply knowing what sort of manuscript has been uploaded much of the decision and archiving process can be automated. However, the agent still needed to retrieve the item from the Apollo workflow, check the version of the file and publication status of the paper, add some metadata fields, approve the item, and move it to an appropriate collection.

Figure 1. The Apollo workflow page of a typical manuscript submission, with the addition of the new ‘Fast Track’ field.

The choice of Fast Track decision leads to four possible outcomes which would ‘trigger’ actions in our Zendesk helpdesk:

Submitted, proof, published (not open access)
- Email submitter, ask for accepted manuscript
Published (open access)
- Archive in Apollo (no embargo) ⇒ Email submitter Apollo link
Accepted (published), accepted (not published)
- Archive in Apollo (embargoed) ⇒ Email submitter Apollo link
Other
- Refer to Open Access Team

Despite being a much faster process, it was still manually tedious. It could also require up to 33 actions from agents (29 mouse clicks) and 14 web pages to be loaded, still not very user friendly. However, the time to archive had decreased from 18 to 9 minutes – a 50% reduction from the previous fully manual system.

Do it faster

So what if all the steps involved in processing a manuscript submission could be reduced to the absolute minimum, and be actionable within a single webpage? After a short development sprint, the Open Access Team launched the ‘Fast Track Deposits’ interface last September. A snapshot of the user interface is shown below.

Figure 2. The Fast Track interface. Choosing one of the options in blue is enough to fully archive a manuscript, or process it for further action by the submitter or the Open Access Team.

At the top of the page, the agent can see a ‘publication summary’ including the item title, the journal title, and publisher DOI if available. Both the item title and publisher DOI are hyperlinked, so that the agent can Google-search the item or land on the publisher’s webpage with a single mouse click.

The agent must first inspect the file and check that it is a suitable version (i.e. either the accepted version or the open access published version). If wrongly labelled, they must relabel the file via a dropdown menu, and add/delete files as appropriate. The agent then ‘describes’ the manuscript (i.e. decides whether it is the accepted, published, submitted or proof version) and submits their decision. The decision determines the trigger behaviour in the automatically populated helpdesk ticket. The agent is then free to move on to the next item.

If the decision is ‘accepted’ or ‘published open access’, the item is deposited and the submitter is automatically notified via email. For submitted, proof, and non-OA published versions, the author receives an automatic email asking for the accepted manuscript. Items are archived in the repository under a generic collection, and any forthcoming publication details are added to the record via external source information in Elements.

To see just how efficient Fast Track is we’ve prepared a short demonstration video which captures some of the key features:

Video 1. Real-time demonstration of the Fast Track system.

Makes us stronger

Agents therefore need only make one decision: identify the file version. But the real ingenuity of the Fast Track system is that embargoes can be set automatically by:

Taking into account the decision made by the agent (e.g. no embargo if published open access);
Detecting publication status and publication dates from Elements; and
Retrieving journals’ embargo policies via Orpheus (you can learn more about Orpheus in our previous blog post).

In some cases, usually because we don’t know the publication date, we can’t determine the embargo length of an accepted manuscript. In such cases we apply a 36 month embargo from the date of the Fast Track decision. We know that this embargo won’t always be correct, however, we routinely check manuscripts in Apollo and update embargoes accordingly.

Figure 3. Simplified overview of the Fast Track process. The key decision is to determine the type of manuscript that has been submitted. Everything else is handled automatically.

Since launching Fast Track the average time to process a manuscript is 1-2 minutes. More than 8,000 items have been processed since launching the phase two Fast-Track interface. If items processed under the phase one effort are included, the number goes up to just over 14,000. And since a picture speaks a thousand words, Figure 4 below shows the effect produced by the new interface launched in September on our backlog of unprocessed submissions.

Figure 4. Historical change in the number of unprocessed open access manuscript submissions. The total number of outstanding manuscript submissions peaked at nearly 2,400 in September 2018. Immediately after launching the Fast Track website the backlog dropped dramatically and was completely eliminated by March 2019.

We will continue to develop Fast Track to further streamline our processing of manuscripts. We have already started to partner with librarians and administrators across the University to leverage the collective knowledge about open access which now exists within the University’s professional academic services.

Get in contact: If you are running a DSpace repository and would like to implement Fast Track to work alongside your existing workflows email us at support@repository.cam.ac.uk

Published 23 April 2019
Written by Dr Mélodie Garnier and Dr Arthur Smith

Unlocking Research

Open Research at Cambridge