Category Archives: Uncategorized

Cambridge Data Week 2020 day 5: How do we peer review data? New sustainable and effective models

Cambridge Data Week 2020 was an event run by the Office of Scholarly Communication at Cambridge University Libraries from 23–27 November 2020. In a series of talks, panel discussions and interactive Q&A sessions, researchers, funders, publishers and other stakeholders explored and debated different approaches to research data management. This blog is part of a series summarising each event:   

The rest of the blogs comprising this series are as follows:
Cambridge Data Week day 1 blog
Cambridge Data Week day 2 blog
Cambridge Data Weekday 3 blog
Cambridge Data Week day 4 blog

Introduction  

Cambridge Data Week 2020 concluded on 27 November with a discussion between Dr Lauren Cadwallader (PLOS), Professor Stephen Eglen (University of Cambridge) and Kiera McNeice (Cambridge University Press) on models of data peer review. The peer review process around data is still emerging despite the increase in data sharing. This session explored how peer review of data could be approached from both a publishing and a research perspective. 

The discussion focused on three main questions and here are a few snippets of what was said. If you’d like to explore the speakers’ answers in full, see the recording and transcript below.  

Why is it important to peer review datasets?

Are we in a post-truth world where claims can be made without needing to back them up? What if data could replace articles as the main output of research? What key criteria should peer review adopt?

Word cloud created by the audience in response to “Why is it important to peer review datasets?” The four most prominent words are: integrity, quality, trust, reproducibility.
Figure 1: Word cloud created by the audience in response to “Why is it important to peer review datasets?”

How should data review be done?

Can we drive the spread of Open Data by initially setting an incredibly low bar, encouraging everyone to share data even in its messy state? Are we reviewing to ensure reusability, or do we want to go further and check quality and reproducibility? Is data review a one-off event, or a continuous process involving everyone who reuses the data?

Are journals exclusively responsible for data review, or should authors, repository managers and other organisations be involved? Where will the money come from? What’s in it for researchers who volunteer as data reviewers? How do we introduce the peer review of data in a fair and equitable way? 

Who should be doing the work?

Are journals exclusively responsible for data review, or should authors, repository managers and other organisations be involved? Where will the money come from? What’s in it for researchers who volunteer as data reviewers? How do we introduce the peer review of data in a fair and equitable way?

Watch the session 

The video recording of the webinar can be found below and the transcript is present in Apollo, the University of Cambridge repository

Bonus material 

After the end of the session, Lauren, Kiera and Stephen continued the discussion, prompted by a question from the audience about whether there should be some form of template or checklist for peer reviewing code. Here is what they said. 

Lauren Cadwallader  That’s an interesting idea, though of course code is written for different reasons, software, analysis, figures, and so on. Inevitably there will be different ways of reviewing it. Stephen can you tell us more about your experience with CODECHECK? 

Stephen Eglen At CODECHECK we have a process to help codecheckers run research code and award a “certificate of executable computation”, like this example of a report. If doing nothing else, then copying whatever files you’ve got onto some repository, dirty and unstructured as that might seem is still gold dust to the next researcher that comes along. Initially we can set the standards low, and from there we can come up with a whole range of more advanced quality checks. One question is ‘what are researchers willing to accept?’ I know of a couple of pilots that tried requiring more work from researchers in preparing and checking their files and code, such as the Code Ocean pilot that Kiera mentioned. I think that we have a community that understand the importance of this and is willing to put in some effort.  

Kiera McNeice There’s value in having checklists that are not extremely specialised, but tailored somewhat towards different subject areas. For instance, the American Journal of Political Science has two separate checklists, one for quantitative data and one for qualitative data. Certainly, some of our HSS editors have been saying that some policies developed for quantitative data do not work for their authors.  

Lauren Cadwallader  It might be easy to start with places where there are communities that are already engaged and have a framework for data sharing, so the peer review system would check that. What do you think? 

Kiera McNeice I guess there is a ‘chicken and egg’ issue: does this have to be driven from the top down, from publishers and funders, or does it come from the bottom up, with research communities initiating it? As journals, there is a concern that if we try to enforce very strict standards, then people will take their publications elsewhere. If there is no desire from the community for these changes, publisher enforcement can only go so far.  

Stephen Eglen Funders have an important role to play too. If they lead on this, researchers will follow because ultimately researchers are focused on their career. Unless there is recognition that there doing this as a valuable part of one’s work, it will be hard to convince the majority of researchers to spend time on it.  

Take a pilot I was involved in with Nature Neuroscience. Originally this was meant to be a mandatory peer review of code after acceptance in principle, but in the end fears about driving away authors meant it was only made optional. Throughout a six-month trial, I was only aware of two papers that went through code review. I can see the barriers for both journal and authors, but if researchers received credit for doing it, this sort of thing will come from the bottom up. 

Lauren Cadwallader  In our biology-based model review pilot we ran a survey and found that many people opted in because they believe in open science, reproducibility, and so on, but two people opted in because they feared PLOS would think they had something to hide if they didn’t. That’s not at all what it was about. Although I suppose if it gets people sharing data… 

Conclusion 

We were intrigued by many of the ideas put forward by the speakers, particularly the areas of tension that will need to be resolved. For instance, as we try to move from a world where most data remains in people’s laptops and drawers to a FAIR data world, even sharing simple, messy, unstructured data is ‘gold dust’. Yet ultimately, we want data to be shared with extensive metadata and in an easily accessible form. What should the initial standards be, and how should they be raised over time? And how about the idea of asking Early Career Researchers to take on reviewer roles? Certainly they (and their research communities) would benefit in many ways from such involvement, but will they be able to fit this in their packed schedules?  

The audience engaged in lively discussion throughout the session, especially around the use of repositories, the need for training, and disciplinary differences. At the end of the session, they surprised us all with their responses to our poll: “Which peer review model would work best for data?”. The most common response was ‘Incorporate it into the existing review of the article”, an option that had hardly been mentioned in the session. Perhaps we’ll need another webinar exploring this avenue next year! 

Poll graph showing the audience's response to the question "“Which peer review model would work best for data?”
Figure 2: Audience responses to a poll held at the end of the event 

Resources 

Alexandra Freeman’s Octopus project aims to change the way we report research. Read the Octopus blog and an interview with Alex to find out more.  

Publish your computer code: it is good enough, a column by Nick Barnes in Nature in 2010 arguing that sharing code, whatever the quality, is more helpful than keeping it in a drawer.  

The Center for Reproducible Biomedical Modelling has been working with PLOS on a pilot about reviewing models.  

PLOS guidelines on peer-reviewing data were produced in collaboration with the Cambridge Data Champions 

CODECHECK, led by Stephen Eglen, runs code to offer a “certificate of reproducible computation” to document that core research outputs could be recreated outside of the authors’ lab. 

Code Ocean is a platform for computational research that creates web-based capsules to help enable reproducibility.  

Editorial on pilot for peer reviewing biology based models in PLOS Computational Biology 

Published on 25 January 2021

Written by Beatrice Gini

CCBY icon

Open access: fringe or mainstream?

When I was just settling in to the world of open access and scholarly communication, I wrote about the need for open access to stop being a fringe activity and enter the mainstream of researcher behaviour:

“Open access needs to stop being a ‘fringe’ activity and become part of the mainstream. It shouldn’t be an afterthought to the publication process. Whether the solution to academic inaction is better systems or, as I believe, greater engagement and reward, I feel that the scholarly communications and repository community can look forward to many interesting developments over the coming months and years.”

While much has changed in the five years since I (somewhat naïvely) wrote those concluding thoughts, there are still significant barriers towards the complete opening of scholarly discourse. However, should open access be an afterthought for researchers? I’ve changed my mind. Open access should be something researchers don’t even need to think about, and I think that future is already here, though I fear it will ultimately sideline institutional repositories.

According to the 2020 Leiden Ranking, the median rate at which UK institutions make their research outputs open access is over 80%, which is far higher than any other nation (Figure 1). Indeed, the UK is the only country that has ‘levelled up’ over the last five years, while the rest of the world’s institutions have slowly plodded along making slow, but steady, progress.

Figure 1. The median institutional open access percentage for each country according to the Leiden Ranking. Note, these figures are medians of all institutions within a country. This does not mean that 80% of the UK’s publications are open access, but that the median rate of open access at UK institutions is 80%.

The main driver for this increase in open access content in the UK is through green open access (Figure 2), due in large part to the REF 2021 open access policy (announced in 2014 and effective from 2016). This is a dramatic demonstration of the influence that policy can have on researcher behaviour, which has made open access a mainstream activity in the UK.

Figure 2. The median institutional green open access percentage for each country according to the Leiden Ranking.

Like the rest of the UK, Cambridge has seen similar trends across all forms of open access (Figure 3), with rising use of green open access, and steadily increasing adoption of gold and hybrid. Yet despite all the money poured into gold and (more controversially) hybrid open access, the net effect of all this other activity is a measly 3% additional open access content (82% vs 79%). Which begs the question, was it worth it? If open access can be so successfully achieved through green routes, what is the inherent benefit of gold/hybrid open access?

Figure 3. Open access trends in Cambridge according to the Leiden Ranking. In the 2020 ranking, 79% was delivered through green open access. This means that despite all the work to facilitate other forms of open access, this activity only contributed an additional 3% to the total (82%).

Of course, Plan S has now emerged as the most significant attempt to coordinate a clear and coherent international strategy for open access. While it is not without its detractors, I am nonetheless supportive of cOAlition S’s overall aims. However, as the UK scholarly communication community has experienced, policy implementation is messy and can lead to unintended consequences. While Plan S provides options for complying through green open access routes, the discussions that institutions and publishers (both traditional and fully open access alike) have engaged in are almost entirely focussed on gold open access through transformative deals. This is not because we, as institutions, want to spend more on publishing, but rather it is the pragmatic approach to create open access content at the source and provide authors with easy and palatable routes to open access. It also is a recognition that flipping journals requires give and take from institutions and publishers alike.

We are now very close to reaching a point where open access can be an afterthought for researchers, particularly in the UK. In large part, it will be done for them through direct agreements between institutions and publishers. Cambridge already has open access publishing arrangements with over 5000 journals, and this figure will continue to grow as we sign more transformative agreements. However, this will ultimately be to the detriment of green open access. Instead of being the only open access source for a journal article, institutional repositories will instead become secondary storehouses of already gold open access content. The heyday of institutional repositories, if one ever existed, is now over.

For me, that is a sad thought. We have poured enormous resource and effort into maintaining Apollo, but we must recognise the burden that green open access places on researchers. They have better things to do. I expect that the next five years will see a dramatic increase in gold and hybrid open access content produced in the UK. Green open access won’t go away, but we will have entered a time where open access is no longer fringe, nor indeed mainstream, but rather de facto for all research.

Published 23 October 2020

Written by Dr Arthur Smith

This icon displays that the content of this blog is licensed under CC BY 4.0

What questions reveal about researchers’ attitudes to Open Access

By Dr Bea Gini, Training Coordinator

‘Right, that concludes this part of the training session, are there any questions?’ 

I’ve asked this scores of times in the last academic year, and it’s always fascinating to hear what questions emerge. Some have come up often enough that they have earned themselves a new slide in the training session. Others can be really niche, or reveal something about a specific field that is different to all other disciplines. Sometimes a question beautifully cuts through all the frills to challenge a key aspect of what has been discussed. In all cases, they have shown thoughtfulness and a real wish to engage with Open Research. 

Over the last academic year, we trained over 300 researchers on Open Research. In this post, I teased out a few of the most interesting or common questions they have asked about Open Access (OA) to explore what they may reveal about how they relate to the idea of OA. This is not an FAQ page, nor is it a comprehensive resource about OA at Cambridge. I will resist the urge to answer any of the questions, but rather focus on the themes they raise. 

Incentives 

Naturally, many of the questions reflect the incentives in research careers. When speaking to Arts & Humanities groups, the aim to turn a PhD thesis into a monograph is common, so questions are raised over publishers’ attitudes to OA theses and possible access levels for theses in Apollo. With ‘publish or perish’ still a common mantra, we have carefully considered how PhD graduates can deposit their theses in the repository without compromising future publishing deals. Many publishers now realise that an OA thesis is not necessarily a problem, but this is still a debated issue and more conversations between publishers, students, supervisors and libraries are needed.  

With STEMM groups, Registered Reports often come up, prompting discussions of their benefits in securing a publication avenue early and improving reporting practices. And yet the bias against negative results is profoundly embedded and hard to shake. More than once, I was asked ‘but if I do the experiment and get negative results, can I still go back and change the method to see if I can get positive ones?’. The first time I was a little baffled, worrying that I had not properly explained the problems with under-reporting negative results. Yet with further discussion it became clear that the researchers agreed with the principle, but felt that publishing positive results was more likely to earn them citations and prestige. In such a competitive environment, who can blame them for trying to give themselves the best chance?  

At other times, it’s heartening to see that incentives are better aligned between researchers, the academic community, and the public at large. I’ve received growing numbers of questions about how to disseminate findings to colleagues, the general public, and the research subjects themselves. In a few cases, researchers were grappling with dissemination strategies in rural areas of the developing world, where the usual solutions like blogs and podcasts would not work. It prompted me to think more broadly about dissemination strategies, making sure that biases for particular parts of the world or audience types do not come to dominate our suggestions.  

Barriers to Open Access 

By far the most common questions I hear is ‘where can I find the money?’, usually asked with some frustration at the gap between what seems to be a great idea (Open Access) and the seemingly insurmountable barrier of Article or Book Processing Charges. This frustration is more common in the Arts, Humanities and Social Sciences, whereas in Science, Technology, Engineering and Maths grants often cover publication costs, as long as the applicant remembers to factor those in. Exorbitant costs, as well as concerns about the type of license and dealing with privacy and qualitative data, can contribute to disillusionment with the OA movement, which I fear is growing among AHSS researchers.  There is no easy solution to this, especially for researchers who are not funded through Research Councils, and for monographs that can cost close to –or even over- £10,000. But some progress has been made: Read And Publish deals may bridge that gap in some cases, and some alternative business models for monographs are emerging.  

Another common question when I speak to enthusiastic PhD students is ‘how can I convince my supervisor to publish OA?’. First of all, it’s great that these discussions are happening between students and supervisors, a great example of where supervision can be a high-value exchange of ideas. The deeper question concerns the decision-making dynamics within the student-supervisor relationship. I have seen extreme cases where supervisors delegated virtually all decisions to the student, trusting in their judgement and the pedagogic value of making mistakes; as well as the opposite, where the students were expected to follow instructions to the letter in almost every aspect of their research. As is usually the case, the optimum must rest somewhere between those extremes. When it comes to OA, are reluctant supervisors helpfully schooling their students in the strategising needed for a successful research career, or are they stifling innovation in a new generation of researchers?  

The last barrier to mention is lack of knowledge. A variety of questions arise on issues of copyright, Green and Gold OA, identifying manuscript versions, funders policies, and more. The OA landscape is still developing as we continue to experiment with business models, agreements, workflows, and policies. This means that currently there is a high level of complexity and things change year on year. Researchers, especially those in their early career, have to juggle a large and diverse portfolio of skills, so they could be forgiven for shrugging OA away with a ‘I don’t need to know’. Yet their natural curiosity and belief in the power of free information leads many of them to ask probing questions to understand this landscape. Luckily, these questions are the easiest to answer. We constantly produce and revise training materials to boost researcher’s knowledge, and we have helpdesks and webpages where the answer can be at their fingertips.  

All in all 

Taken together, these questions tell us two things. First, researchers are engaging with us, they want to understand how OA works and have the confidence to embrace it. Second, there are common barriers relating to  career incentives, costs and knowledge. By listening carefully and expanding the dialogue with all disciplines, we can work together to reduce or overcome those barriers.