All posts by Alexia Sutton

What questions reveal about researchers’ attitudes to Open Access

By Dr Bea Gini, Training Coordinator

‘Right, that concludes this part of the training session, are there any questions?’ 

I’ve asked this scores of times in the last academic year, and it’s always fascinating to hear what questions emerge. Some have come up often enough that they have earned themselves a new slide in the training session. Others can be really niche, or reveal something about a specific field that is different to all other disciplines. Sometimes a question beautifully cuts through all the frills to challenge a key aspect of what has been discussed. In all cases, they have shown thoughtfulness and a real wish to engage with Open Research. 

Over the last academic year, we trained over 300 researchers on Open Research. In this post, I teased out a few of the most interesting or common questions they have asked about Open Access (OA) to explore what they may reveal about how they relate to the idea of OA. This is not an FAQ page, nor is it a comprehensive resource about OA at Cambridge. I will resist the urge to answer any of the questions, but rather focus on the themes they raise. 


Naturally, many of the questions reflect the incentives in research careers. When speaking to Arts & Humanities groups, the aim to turn a PhD thesis into a monograph is common, so questions are raised over publishers’ attitudes to OA theses and possible access levels for theses in Apollo. With ‘publish or perish’ still a common mantra, we have carefully considered how PhD graduates can deposit their theses in the repository without compromising future publishing deals. Many publishers now realise that an OA thesis is not necessarily a problem, but this is still a debated issue and more conversations between publishers, students, supervisors and libraries are needed.  

With STEMM groups, Registered Reports often come up, prompting discussions of their benefits in securing a publication avenue early and improving reporting practices. And yet the bias against negative results is profoundly embedded and hard to shake. More than once, I was asked ‘but if I do the experiment and get negative results, can I still go back and change the method to see if I can get positive ones?’. The first time I was a little baffled, worrying that I had not properly explained the problems with under-reporting negative results. Yet with further discussion it became clear that the researchers agreed with the principle, but felt that publishing positive results was more likely to earn them citations and prestige. In such a competitive environment, who can blame them for trying to give themselves the best chance?  

At other times, it’s heartening to see that incentives are better aligned between researchers, the academic community, and the public at large. I’ve received growing numbers of questions about how to disseminate findings to colleagues, the general public, and the research subjects themselves. In a few cases, researchers were grappling with dissemination strategies in rural areas of the developing world, where the usual solutions like blogs and podcasts would not work. It prompted me to think more broadly about dissemination strategies, making sure that biases for particular parts of the world or audience types do not come to dominate our suggestions.  

Barriers to Open Access 

By far the most common questions I hear is ‘where can I find the money?’, usually asked with some frustration at the gap between what seems to be a great idea (Open Access) and the seemingly insurmountable barrier of Article or Book Processing Charges. This frustration is more common in the Arts, Humanities and Social Sciences, whereas in Science, Technology, Engineering and Maths grants often cover publication costs, as long as the applicant remembers to factor those in. Exorbitant costs, as well as concerns about the type of license and dealing with privacy and qualitative data, can contribute to disillusionment with the OA movement, which I fear is growing among AHSS researchers.  There is no easy solution to this, especially for researchers who are not funded through Research Councils, and for monographs that can cost close to –or even over- £10,000. But some progress has been made: Read And Publish deals may bridge that gap in some cases, and some alternative business models for monographs are emerging.  

Another common question when I speak to enthusiastic PhD students is ‘how can I convince my supervisor to publish OA?’. First of all, it’s great that these discussions are happening between students and supervisors, a great example of where supervision can be a high-value exchange of ideas. The deeper question concerns the decision-making dynamics within the student-supervisor relationship. I have seen extreme cases where supervisors delegated virtually all decisions to the student, trusting in their judgement and the pedagogic value of making mistakes; as well as the opposite, where the students were expected to follow instructions to the letter in almost every aspect of their research. As is usually the case, the optimum must rest somewhere between those extremes. When it comes to OA, are reluctant supervisors helpfully schooling their students in the strategising needed for a successful research career, or are they stifling innovation in a new generation of researchers?  

The last barrier to mention is lack of knowledge. A variety of questions arise on issues of copyright, Green and Gold OA, identifying manuscript versions, funders policies, and more. The OA landscape is still developing as we continue to experiment with business models, agreements, workflows, and policies. This means that currently there is a high level of complexity and things change year on year. Researchers, especially those in their early career, have to juggle a large and diverse portfolio of skills, so they could be forgiven for shrugging OA away with a ‘I don’t need to know’. Yet their natural curiosity and belief in the power of free information leads many of them to ask probing questions to understand this landscape. Luckily, these questions are the easiest to answer. We constantly produce and revise training materials to boost researcher’s knowledge, and we have helpdesks and webpages where the answer can be at their fingertips.  

All in all 

Taken together, these questions tell us two things. First, researchers are engaging with us, they want to understand how OA works and have the confidence to embrace it. Second, there are common barriers relating to  career incentives, costs and knowledge. By listening carefully and expanding the dialogue with all disciplines, we can work together to reduce or overcome those barriers.  

Research Data at Cambridge – highlights of the year so far

By Dr Sacha Jones, Research Data Coordinator

This year we have continued, as always, to provide support and services for researchers to help with their research data management and open data practices. So far in 2020, we have approved more than 230 datasets into our institutional repository, Apollo. This includes Apollo’s 2000th dataset on the impact of health warning labels on snack selection, which represents a shining example of reproducible research, involving the full gamut: preregistration, and sharing of consent forms, code, protocols, data. There are other studies that have sparked media interest for which the data are also openly available in Apollo, such as the data supporting research that reports the development of a wireless device that can convert sunlight, carbon dioxide and water into a carbon-neutral fuel. Or, data supporting a study that has used computational modelling to explain why blues and greens are the brightest colours in nature. Also, and in the year of COVID, a dataset was published in April on the ability of common fabrics to filter ultrafine particles, associated with an article in BMJ Open. Sharing data associated with publications is critical for the integrity of many disciplines and best practice in the majority of studies, but there is also an important responsibility of science communication in particular to bring research datasets to the forefront. This point was discussed eloquently this summer in a guest blog post in Unlocking Research by Itamar Shatz, a researcher and Cambridge Data Champion. Making datasets open permits their reuse, and if you have wondered how research data is reused and then read this comprehensive data sharing and reuse case study written by the Research Data team’s Dominic Dixon. This centres on the use and value of the Mammographic Image Society database, published in Apollo five years ago. 

This year has seen the necessary move from our usual face-to-face Research Data Management (RDM) training to provision of training online. This has led us to produce an online training session in RDM, covering topics such as data organisation, storage, back up and sharing, as well as data management plans. This forms one component of a broader Research Skills Guide – an online course for Cambridge researchers on publishing, managing data, finding and disseminating research  – developed by Dr Bea Gini, the OSC’s training coordinator. We have also contributed to a ‘Managing your study resources’ CamGuide for Master’s students, providing guidance on how to work reproducibly. In collaboration with several University stakeholders we released last month new guidance on the use of electronic research notebooks (ERNs), providing information on the features of ERNs and guidance to help researchers select one that is suitable. 

At the start of this year we invited members of the University to apply to become Data Champions, joining the pre-existing community of 72 Data Champions. The 2020 call was very successful, with us welcoming 56 new Data Champions to the programme. The community has expanded this year, not only in terms of numbers of volunteers but also in terms of disciplinary focus, where there are now Data Champions in several areas of the arts, humanities and social sciences in particular where there were none previously. During this year, we have held forums in person and then online, covering themes such as how to curate manual research records, ideas for RDM guidance materials, data management in the time of coronavirus, and data practices in the arts and humanities and how these can be best supported. We look forward to further supporting and advocating the fantastic work of the Cambridge Data Champions in the months and years to come.  

Open Access and REF 2021: “Is This Article Non-Compliant?”

By Dr Debbie Hansen, Senior Open Access Adviser, Office of Scholarly Communication

Through much of this REF period, there has been a focus on encouraging Cambridge authors to deposit their accepted manuscripts into our institutional repository.  The Open Access Team has tackled the sometimes tricky tasks of making sure the right version has been deposited with the correct embargo, advising on funders’ open access requirements and managing the payments for gold open access from the UKRI and COAF block grants. 

With the REF period ending, the University is now finalising lists of research outputs to be submitted to REF2021. Alongside this activity, some members of the Open Access Team have been focussing on compliance indicators for the REF open access policy. In Symplectic Elements, the University’s research information management system, all journal or conference articles which fall within the period of the REF open access policy are labelled as either Compliant or Non-compliant.   

Unfortunately, from an administrative point of view, this is not as straightforward as it may seem (but it is fortunate for compliance).  This compliance indicator is set automatically from calculations using the acceptance, first publication and deposit dates as well as the repository embargo lift date.  It is, if you like, a first-pass indicator.  ‘Non-compliant’ articles may end up as being compliant or REF eligible as they may, for example: 

  1. have gaps in their metadata such as missing acceptance or publication dates; 
  2. have incorrect publication dates in the external metadata records (one article can have around 10 separate metadata records (e.g. from Scopus, Crossref, Europe PMC, etc.) and Elements takes the earliest publication date from all the metadata records associated with an article.  01/01/YYYY is a common red herring where only the year of publication (YYYY) has been recorded and the month and day fields have been automatically filled with a default value);  
  3. have embargo lift dates greater than 12 months from first publication (Panel C and D articles can have embargo lengths of up to 24 months but the system does not recognise this); 
  4. be compliantly deposited in a different non-commercial open access repository; 
  5. be eligible for one of the REF exceptions to the policy; 
  6. be published gold open access and so do not need to be deposited in a repository to fulfil the REF open access criteria. 

If an article is showing as non-compliant, it generally requires individual investigation by a team member.  However, as has been raised in previous blogs, we try to develop processes to balance staff resources against the sheer numbers of articles.  For this problem, I will mention two tools we have been using to address in bulk three common article scenarios: missing acceptance or publication dates, deposited in another repository and published gold open access. 

Missing acceptance or publication dates 

Acceptance dates are not always openly provided by a journal or conference and some publication dates can be hard to find (e.g. for some humanities, arts and social science journals) or have been missed for some other reason.  In these instances, the author may be able to help.  For example, they may be able to check past correspondence with the journal or with co-authors.  

Our colleagues in the University Research Information Office, Agustina Martínez-García and Owen Roberson, developed an internal, simple to use tool, aptly named LastMinute.CAM1.  This tool uses an article’s Elements identifier to create an article-customised form that can be sent to an author to request missing information.  The form is pre-populated with article title and other information already held about an article (e.g. it’s digital object identifier (DOI)) and the author can fill in missing acceptance or publication dates.  Once the form is submitted, the data populates a new record for the article in Symplectic Elements and the data is used, alongside all the other data for that article for the compliance calculations.  We have tried to use LastMinute.CAM for this purpose on a considered basis (we do not wish to contact authors unnecessarily) and have attempted to resolve the issue of missing dates, and links to articles in other repositories (next section), in this way for hundreds of papers via mail merge lists. 

Article deposited in another repository 

Some authors have been contacted with the LastMinute.CAM form because their article was deposited late in Apollo, or there is no deposit at all, but their article may be compliant in another repository (e.g. deposited by a co-author at another university).  LastMinute.CAM is integrated with Unpaywall: the application searches Unpaywall data via its Application Programming Interface (API) and records in the form the link to the preferred open access location, together with the article version if available.  A recipient of the form can accept this, or remove it (they may know it was not compliantly deposited) or amend the repository link and version already populated in the form with an alternative.   

Having a link to an article in another repository is of course a first step.  A team member will need to check the link (we have found URLs to non-repository web pages) and investigate whether the article is compliantly deposited in the other repository.  However, when we do find a compliant deposit, this source is already recorded for us, removing some of the legwork we would otherwise need to do to complete our records. 

Article published gold open access 

Unpaywall has also been a great tool for identifying articles that have been published open access through the gold route.  The Unpaywall Simple Query Tool accepts a list of up to 1000 DOIs and returns a report of the open access status of the article associated with each DOI.  We do need to analyse the results carefully and discard, for example, those made open through the accepted manuscript and the green route, published versions without an open licence (bronze open access) and those published with an open licence but only after a defined time delay.  Once we are happy with the cleaned list this can be used as input to an Elements API script (also developed by Agustina Martínez-García) to bulk annotate articles that have been identified as being published as gold open access.  To date we have identified over 1000 articles in this way. 


Henceforth we plan to run the gold OA bulk ‘exception’ process monthly and have in the background the option to use LastMinute.CAM further to gather missing information via targeted mail shots to authors.  We will also be addressing in an automated way those articles that were compliantly deposited and with the correct embargo applied but not recognised as compliant by the system due to a ‘perceived’ too-long embargo.  These activities will leave a far more manageable set of articles, showing as non-compliant, for which more detailed investigations into why articles are being labelled non-compliant can be made and action taken (such as the application of eligible REF exceptions) as appropriate. 

One final comment, once the submission to REF has been made there will be a period of reflection. Effective tools, like those mentioned here, that help with making our processes more efficient will feature in this review.  This review will help to define our future activities in this space.  

1 This tool is only available internally to University of Cambridge researchers, and is not indexed in Google or any search engine