By Dr Debbie Hansen, Senior Open Access Adviser, Office of Scholarly Communication
Through much of this REF period, there has been a focus on encouraging Cambridge authors to deposit their accepted manuscripts into our institutional repository. The Open Access Team has tackled the sometimes tricky tasks of making sure the right version has been deposited with the correct embargo, advising on funders’ open access requirements and managing the payments for gold open access from the UKRI and COAF block grants.
With the REF period ending, the University is now finalising lists of research outputs to be submitted to REF2021. Alongside this activity, some members of the Open Access Team have been focussing on compliance indicators for the REF open access policy. In Symplectic Elements, the University’s research information management system, all journal or conference articles which fall within the period of the REF open access policy are labelled as either Compliant or Non-compliant.
Unfortunately, from an administrative point of view, this is not as straightforward as it may seem (but it is fortunate for compliance). This compliance indicator is set automatically from calculations using the acceptance, first publication and deposit dates as well as the repository embargo lift date. It is, if you like, a first-pass indicator. ‘Non-compliant’ articles may end up as being compliant or REF eligible as they may, for example:
- have gaps in their metadata such as missing acceptance or publication dates;
- have incorrect publication dates in the external metadata records (one article can have around 10 separate metadata records (e.g. from Scopus, Crossref, Europe PMC, etc.) and Elements takes the earliest publication date from all the metadata records associated with an article. 01/01/YYYY is a common red herring where only the year of publication (YYYY) has been recorded and the month and day fields have been automatically filled with a default value);
- have embargo lift dates greater than 12 months from first publication (Panel C and D articles can have embargo lengths of up to 24 months but the system does not recognise this);
- be compliantly deposited in a different non-commercial open access repository;
- be eligible for one of the REF exceptions to the policy;
- be published gold open access and so do not need to be deposited in a repository to fulfil the REF open access criteria.
If an article is showing as non-compliant, it generally requires individual investigation by a team member. However, as has been raised in previous blogs, we try to develop processes to balance staff resources against the sheer numbers of articles. For this problem, I will mention two tools we have been using to address in bulk three common article scenarios: missing acceptance or publication dates, deposited in another repository and published gold open access.
Missing acceptance or publication dates
Acceptance dates are not always openly provided by a journal or conference and some publication dates can be hard to find (e.g. for some humanities, arts and social science journals) or have been missed for some other reason. In these instances, the author may be able to help. For example, they may be able to check past correspondence with the journal or with co-authors.
Our colleagues in the University Research Information Office, Agustina Martínez-García and Owen Roberson, developed an internal, simple to use tool, aptly named LastMinute.CAM1. This tool uses an article’s Elements identifier to create an article-customised form that can be sent to an author to request missing information. The form is pre-populated with article title and other information already held about an article (e.g. it’s digital object identifier (DOI)) and the author can fill in missing acceptance or publication dates. Once the form is submitted, the data populates a new record for the article in Symplectic Elements and the data is used, alongside all the other data for that article for the compliance calculations. We have tried to use LastMinute.CAM for this purpose on a considered basis (we do not wish to contact authors unnecessarily) and have attempted to resolve the issue of missing dates, and links to articles in other repositories (next section), in this way for hundreds of papers via mail merge lists.
Article deposited in another repository
Some authors have been contacted with the LastMinute.CAM form because their article was deposited late in Apollo, or there is no deposit at all, but their article may be compliant in another repository (e.g. deposited by a co-author at another university). LastMinute.CAM is integrated with Unpaywall: the application searches Unpaywall data via its Application Programming Interface (API) and records in the form the link to the preferred open access location, together with the article version if available. A recipient of the form can accept this, or remove it (they may know it was not compliantly deposited) or amend the repository link and version already populated in the form with an alternative.
Having a link to an article in another repository is of course a first step. A team member will need to check the link (we have found URLs to non-repository web pages) and investigate whether the article is compliantly deposited in the other repository. However, when we do find a compliant deposit, this source is already recorded for us, removing some of the legwork we would otherwise need to do to complete our records.
Article published gold open access
Unpaywall has also been a great tool for identifying articles that have been published open access through the gold route. The Unpaywall Simple Query Tool accepts a list of up to 1000 DOIs and returns a report of the open access status of the article associated with each DOI. We do need to analyse the results carefully and discard, for example, those made open through the accepted manuscript and the green route, published versions without an open licence (bronze open access) and those published with an open licence but only after a defined time delay. Once we are happy with the cleaned list this can be used as input to an Elements API script (also developed by Agustina Martínez-García) to bulk annotate articles that have been identified as being published as gold open access. To date we have identified over 1000 articles in this way.
Henceforth we plan to run the gold OA bulk ‘exception’ process monthly and have in the background the option to use LastMinute.CAM further to gather missing information via targeted mail shots to authors. We will also be addressing in an automated way those articles that were compliantly deposited and with the correct embargo applied but not recognised as compliant by the system due to a ‘perceived’ too-long embargo. These activities will leave a far more manageable set of articles, showing as non-compliant, for which more detailed investigations into why articles are being labelled non-compliant can be made and action taken (such as the application of eligible REF exceptions) as appropriate.
One final comment, once the submission to REF has been made there will be a period of reflection. Effective tools, like those mentioned here, that help with making our processes more efficient will feature in this review. This review will help to define our future activities in this space.
1 This tool is only available internally to University of Cambridge researchers, and is not indexed in Google or any search engine