All posts by Alexia Sutton

Research Data at Cambridge – highlights of the year so far

By Dr Sacha Jones, Research Data Coordinator

This year we have continued, as always, to provide support and services for researchers to help with their research data management and open data practices. So far in 2020, we have approved more than 230 datasets into our institutional repository, Apollo. This includes Apollo’s 2000th dataset on the impact of health warning labels on snack selection, which represents a shining example of reproducible research, involving the full gamut: preregistration, and sharing of consent forms, code, protocols, data. There are other studies that have sparked media interest for which the data are also openly available in Apollo, such as the data supporting research that reports the development of a wireless device that can convert sunlight, carbon dioxide and water into a carbon-neutral fuel. Or, data supporting a study that has used computational modelling to explain why blues and greens are the brightest colours in nature. Also, and in the year of COVID, a dataset was published in April on the ability of common fabrics to filter ultrafine particles, associated with an article in BMJ Open. Sharing data associated with publications is critical for the integrity of many disciplines and best practice in the majority of studies, but there is also an important responsibility of science communication in particular to bring research datasets to the forefront. This point was discussed eloquently this summer in a guest blog post in Unlocking Research by Itamar Shatz, a researcher and Cambridge Data Champion. Making datasets open permits their reuse, and if you have wondered how research data is reused and then read this comprehensive data sharing and reuse case study written by the Research Data team’s Dominic Dixon. This centres on the use and value of the Mammographic Image Society database, published in Apollo five years ago. 

This year has seen the necessary move from our usual face-to-face Research Data Management (RDM) training to provision of training online. This has led us to produce an online training session in RDM, covering topics such as data organisation, storage, back up and sharing, as well as data management plans. This forms one component of a broader Research Skills Guide – an online course for Cambridge researchers on publishing, managing data, finding and disseminating research  – developed by Dr Bea Gini, the OSC’s training coordinator. We have also contributed to a ‘Managing your study resources’ CamGuide for Master’s students, providing guidance on how to work reproducibly. In collaboration with several University stakeholders we released last month new guidance on the use of electronic research notebooks (ERNs), providing information on the features of ERNs and guidance to help researchers select one that is suitable. 

At the start of this year we invited members of the University to apply to become Data Champions, joining the pre-existing community of 72 Data Champions. The 2020 call was very successful, with us welcoming 56 new Data Champions to the programme. The community has expanded this year, not only in terms of numbers of volunteers but also in terms of disciplinary focus, where there are now Data Champions in several areas of the arts, humanities and social sciences in particular where there were none previously. During this year, we have held forums in person and then online, covering themes such as how to curate manual research records, ideas for RDM guidance materials, data management in the time of coronavirus, and data practices in the arts and humanities and how these can be best supported. We look forward to further supporting and advocating the fantastic work of the Cambridge Data Champions in the months and years to come.  

Open Access and REF 2021: “Is This Article Non-Compliant?”

By Dr Debbie Hansen, Senior Open Access Adviser, Office of Scholarly Communication

Through much of this REF period, there has been a focus on encouraging Cambridge authors to deposit their accepted manuscripts into our institutional repository.  The Open Access Team has tackled the sometimes tricky tasks of making sure the right version has been deposited with the correct embargo, advising on funders’ open access requirements and managing the payments for gold open access from the UKRI and COAF block grants. 

With the REF period ending, the University is now finalising lists of research outputs to be submitted to REF2021. Alongside this activity, some members of the Open Access Team have been focussing on compliance indicators for the REF open access policy. In Symplectic Elements, the University’s research information management system, all journal or conference articles which fall within the period of the REF open access policy are labelled as either Compliant or Non-compliant.   

Unfortunately, from an administrative point of view, this is not as straightforward as it may seem (but it is fortunate for compliance).  This compliance indicator is set automatically from calculations using the acceptance, first publication and deposit dates as well as the repository embargo lift date.  It is, if you like, a first-pass indicator.  ‘Non-compliant’ articles may end up as being compliant or REF eligible as they may, for example: 

  1. have gaps in their metadata such as missing acceptance or publication dates; 
  2. have incorrect publication dates in the external metadata records (one article can have around 10 separate metadata records (e.g. from Scopus, Crossref, Europe PMC, etc.) and Elements takes the earliest publication date from all the metadata records associated with an article.  01/01/YYYY is a common red herring where only the year of publication (YYYY) has been recorded and the month and day fields have been automatically filled with a default value);  
  3. have embargo lift dates greater than 12 months from first publication (Panel C and D articles can have embargo lengths of up to 24 months but the system does not recognise this); 
  4. be compliantly deposited in a different non-commercial open access repository; 
  5. be eligible for one of the REF exceptions to the policy; 
  6. be published gold open access and so do not need to be deposited in a repository to fulfil the REF open access criteria. 

If an article is showing as non-compliant, it generally requires individual investigation by a team member.  However, as has been raised in previous blogs, we try to develop processes to balance staff resources against the sheer numbers of articles.  For this problem, I will mention two tools we have been using to address in bulk three common article scenarios: missing acceptance or publication dates, deposited in another repository and published gold open access. 

Missing acceptance or publication dates 

Acceptance dates are not always openly provided by a journal or conference and some publication dates can be hard to find (e.g. for some humanities, arts and social science journals) or have been missed for some other reason.  In these instances, the author may be able to help.  For example, they may be able to check past correspondence with the journal or with co-authors.  

Our colleagues in the University Research Information Office, Agustina Martínez-García and Owen Roberson, developed an internal, simple to use tool, aptly named LastMinute.CAM1.  This tool uses an article’s Elements identifier to create an article-customised form that can be sent to an author to request missing information.  The form is pre-populated with article title and other information already held about an article (e.g. it’s digital object identifier (DOI)) and the author can fill in missing acceptance or publication dates.  Once the form is submitted, the data populates a new record for the article in Symplectic Elements and the data is used, alongside all the other data for that article for the compliance calculations.  We have tried to use LastMinute.CAM for this purpose on a considered basis (we do not wish to contact authors unnecessarily) and have attempted to resolve the issue of missing dates, and links to articles in other repositories (next section), in this way for hundreds of papers via mail merge lists. 

Article deposited in another repository 

Some authors have been contacted with the LastMinute.CAM form because their article was deposited late in Apollo, or there is no deposit at all, but their article may be compliant in another repository (e.g. deposited by a co-author at another university).  LastMinute.CAM is integrated with Unpaywall: the application searches Unpaywall data via its Application Programming Interface (API) and records in the form the link to the preferred open access location, together with the article version if available.  A recipient of the form can accept this, or remove it (they may know it was not compliantly deposited) or amend the repository link and version already populated in the form with an alternative.   

Having a link to an article in another repository is of course a first step.  A team member will need to check the link (we have found URLs to non-repository web pages) and investigate whether the article is compliantly deposited in the other repository.  However, when we do find a compliant deposit, this source is already recorded for us, removing some of the legwork we would otherwise need to do to complete our records. 

Article published gold open access 

Unpaywall has also been a great tool for identifying articles that have been published open access through the gold route.  The Unpaywall Simple Query Tool accepts a list of up to 1000 DOIs and returns a report of the open access status of the article associated with each DOI.  We do need to analyse the results carefully and discard, for example, those made open through the accepted manuscript and the green route, published versions without an open licence (bronze open access) and those published with an open licence but only after a defined time delay.  Once we are happy with the cleaned list this can be used as input to an Elements API script (also developed by Agustina Martínez-García) to bulk annotate articles that have been identified as being published as gold open access.  To date we have identified over 1000 articles in this way. 

Summary 

Henceforth we plan to run the gold OA bulk ‘exception’ process monthly and have in the background the option to use LastMinute.CAM further to gather missing information via targeted mail shots to authors.  We will also be addressing in an automated way those articles that were compliantly deposited and with the correct embargo applied but not recognised as compliant by the system due to a ‘perceived’ too-long embargo.  These activities will leave a far more manageable set of articles, showing as non-compliant, for which more detailed investigations into why articles are being labelled non-compliant can be made and action taken (such as the application of eligible REF exceptions) as appropriate. 

One final comment, once the submission to REF has been made there will be a period of reflection. Effective tools, like those mentioned here, that help with making our processes more efficient will feature in this review.  This review will help to define our future activities in this space.  

1 This tool is only available internally to University of Cambridge researchers, and is not indexed in Google or any search engine 

Open Access for Librarians: Putting Together the Puzzle

Claire Sewell, Research Support Librarian, Betty & Gordon Moore Library

This Open Access week I’ve been reflecting back on my time training library staff in research support. As anyone working in this area will know, an understanding of the principles of open access is key to getting to grips with many of the issues covered by the scholarly communications remit so it’s important that librarians get a good grasp of the basics. Open access is a topic rich in terminology and interconnected concepts which can make teaching it a little bit like putting together a jigsaw puzzle with no finished image to guide you. Many introductory sessions begin with an overview of what open access actually means – the process of making the outputs of funded research available online for anyone to read. So far, so simple but even this assumes some knowledge of the current academic publishing system. I often need to spend longer talking about this than I had planned before we can move onto the rest of the session and the pauses don’t stop there. Outlining the importance of open access involves explaining the REF, describing the practicalities means defining what we mean by a repository and describing the different types of OA can be hard when your audience don’t understand the concept of an embargo. 

No two audiences are ever the same as everyone has a different view of the finished picture and I need to be able to provide them with the pieces they need to complete their own OA puzzle. As a result, every session has to be adaptable to the needs of the people in the room. Whilst I still have an overall plan for any open access session, I find it’s a good idea to have some small pre-prepared slides or activities which embed key concepts that I can include if needed. I’ve also come to the realisation that it doesn’t matter which order you place your slides in as you will have to shuffle through them at random as your audience asks questions! This is not always a bad thing as it keeps me on my toes and improves my practice.  

The most common questions I get are detailed below: 

  1. Definitions of various terms – audiences need to know what things such as embargoes, repositories, author accepted manuscript and APC are, but it can be hard to explain one without an understanding of some element of another. Having some type of primer on hand can really help people to understand the language you’re using. 
  2. Manuscript versions – something a lot of people struggle with is which version of a manuscript is which and how this impacts sharing via OA. I find that a visual representation offers the best explanation and often rely on this graphic from our OA FAQs – something I’ve been told makes all the difference. 
  3. Practicalities of OA – this will vary between institutions but a common question is how you actually go through the process of making outputs open. If you can, building in time for a demonstration and/or some hands-on experience can really help learners to understand the process and find all sorts of tricky problems for you to explain! 

So, the message is – no matter who your audience is, it pays to be flexible. Much like the rest of the open access landscape one size definitely does not fit all!