Tag Archives: peer review

Cambridge Data Week 2020 day 5: How do we peer review data? New sustainable and effective models

Cambridge Data Week 2020 was an event run by the Office of Scholarly Communication at Cambridge University Libraries from 23–27 November 2020. In a series of talks, panel discussions and interactive Q&A sessions, researchers, funders, publishers and other stakeholders explored and debated different approaches to research data management. This blog is part of a series summarising each event:   

The rest of the blogs comprising this series are as follows:
Cambridge Data Week day 1 blog
Cambridge Data Week day 2 blog
Cambridge Data Weekday 3 blog
Cambridge Data Week day 4 blog

Introduction  

Cambridge Data Week 2020 concluded on 27 November with a discussion between Dr Lauren Cadwallader (PLOS), Professor Stephen Eglen (University of Cambridge) and Kiera McNeice (Cambridge University Press) on models of data peer review. The peer review process around data is still emerging despite the increase in data sharing. This session explored how peer review of data could be approached from both a publishing and a research perspective. 

The discussion focused on three main questions and here are a few snippets of what was said. If you’d like to explore the speakers’ answers in full, see the recording and transcript below.  

Why is it important to peer review datasets?

Are we in a post-truth world where claims can be made without needing to back them up? What if data could replace articles as the main output of research? What key criteria should peer review adopt?

Word cloud created by the audience in response to “Why is it important to peer review datasets?” The four most prominent words are: integrity, quality, trust, reproducibility.
Figure 1: Word cloud created by the audience in response to “Why is it important to peer review datasets?”

How should data review be done?

Can we drive the spread of Open Data by initially setting an incredibly low bar, encouraging everyone to share data even in its messy state? Are we reviewing to ensure reusability, or do we want to go further and check quality and reproducibility? Is data review a one-off event, or a continuous process involving everyone who reuses the data?

Are journals exclusively responsible for data review, or should authors, repository managers and other organisations be involved? Where will the money come from? What’s in it for researchers who volunteer as data reviewers? How do we introduce the peer review of data in a fair and equitable way? 

Who should be doing the work?

Are journals exclusively responsible for data review, or should authors, repository managers and other organisations be involved? Where will the money come from? What’s in it for researchers who volunteer as data reviewers? How do we introduce the peer review of data in a fair and equitable way?

Watch the session 

The video recording of the webinar can be found below and the transcript is present in Apollo, the University of Cambridge repository

Bonus material 

After the end of the session, Lauren, Kiera and Stephen continued the discussion, prompted by a question from the audience about whether there should be some form of template or checklist for peer reviewing code. Here is what they said. 

Lauren Cadwallader  That’s an interesting idea, though of course code is written for different reasons, software, analysis, figures, and so on. Inevitably there will be different ways of reviewing it. Stephen can you tell us more about your experience with CODECHECK? 

Stephen Eglen At CODECHECK we have a process to help codecheckers run research code and award a “certificate of executable computation”, like this example of a report. If doing nothing else, then copying whatever files you’ve got onto some repository, dirty and unstructured as that might seem is still gold dust to the next researcher that comes along. Initially we can set the standards low, and from there we can come up with a whole range of more advanced quality checks. One question is ‘what are researchers willing to accept?’ I know of a couple of pilots that tried requiring more work from researchers in preparing and checking their files and code, such as the Code Ocean pilot that Kiera mentioned. I think that we have a community that understand the importance of this and is willing to put in some effort.  

Kiera McNeice There’s value in having checklists that are not extremely specialised, but tailored somewhat towards different subject areas. For instance, the American Journal of Political Science has two separate checklists, one for quantitative data and one for qualitative data. Certainly, some of our HSS editors have been saying that some policies developed for quantitative data do not work for their authors.  

Lauren Cadwallader  It might be easy to start with places where there are communities that are already engaged and have a framework for data sharing, so the peer review system would check that. What do you think? 

Kiera McNeice I guess there is a ‘chicken and egg’ issue: does this have to be driven from the top down, from publishers and funders, or does it come from the bottom up, with research communities initiating it? As journals, there is a concern that if we try to enforce very strict standards, then people will take their publications elsewhere. If there is no desire from the community for these changes, publisher enforcement can only go so far.  

Stephen Eglen Funders have an important role to play too. If they lead on this, researchers will follow because ultimately researchers are focused on their career. Unless there is recognition that there doing this as a valuable part of one’s work, it will be hard to convince the majority of researchers to spend time on it.  

Take a pilot I was involved in with Nature Neuroscience. Originally this was meant to be a mandatory peer review of code after acceptance in principle, but in the end fears about driving away authors meant it was only made optional. Throughout a six-month trial, I was only aware of two papers that went through code review. I can see the barriers for both journal and authors, but if researchers received credit for doing it, this sort of thing will come from the bottom up. 

Lauren Cadwallader  In our biology-based model review pilot we ran a survey and found that many people opted in because they believe in open science, reproducibility, and so on, but two people opted in because they feared PLOS would think they had something to hide if they didn’t. That’s not at all what it was about. Although I suppose if it gets people sharing data… 

Conclusion 

We were intrigued by many of the ideas put forward by the speakers, particularly the areas of tension that will need to be resolved. For instance, as we try to move from a world where most data remains in people’s laptops and drawers to a FAIR data world, even sharing simple, messy, unstructured data is ‘gold dust’. Yet ultimately, we want data to be shared with extensive metadata and in an easily accessible form. What should the initial standards be, and how should they be raised over time? And how about the idea of asking Early Career Researchers to take on reviewer roles? Certainly they (and their research communities) would benefit in many ways from such involvement, but will they be able to fit this in their packed schedules?  

The audience engaged in lively discussion throughout the session, especially around the use of repositories, the need for training, and disciplinary differences. At the end of the session, they surprised us all with their responses to our poll: “Which peer review model would work best for data?”. The most common response was ‘Incorporate it into the existing review of the article”, an option that had hardly been mentioned in the session. Perhaps we’ll need another webinar exploring this avenue next year! 

Poll graph showing the audience's response to the question "“Which peer review model would work best for data?”
Figure 2: Audience responses to a poll held at the end of the event 

Resources 

Alexandra Freeman’s Octopus project aims to change the way we report research. Read the Octopus blog and an interview with Alex to find out more.  

Publish your computer code: it is good enough, a column by Nick Barnes in Nature in 2010 arguing that sharing code, whatever the quality, is more helpful than keeping it in a drawer.  

The Center for Reproducible Biomedical Modelling has been working with PLOS on a pilot about reviewing models.  

PLOS guidelines on peer-reviewing data were produced in collaboration with the Cambridge Data Champions 

CODECHECK, led by Stephen Eglen, runs code to offer a “certificate of reproducible computation” to document that core research outputs could be recreated outside of the authors’ lab. 

Code Ocean is a platform for computational research that creates web-based capsules to help enable reproducibility.  

Editorial on pilot for peer reviewing biology based models in PLOS Computational Biology 

Published on 25 January 2021

Written by Beatrice Gini

CCBY icon

‘No free labor’ – we agree.

[NOTE: The introductory sentence to this blog was changed on 27 June to provide clarification]

Last week members of the University of California* released a Call to Action to ‘Champion change in journal negotiations’ which references the April 2018 Declaration of Rights and Principles to Transform Scholarly Communication.  This states as one of the 18 principles:

No free labor. Publishers shall provide our Institution with data on peer review and editorial contributions by our authors in support of journals, and such contributions shall be taken into account when determining the cost of our subscriptions or OA fees for our authors.”

Well, this is interesting. At Cambridge we have been trying to look at this specific issue since late last year.

The project

Our goal was to have a better understanding of the interaction between publisher and researcher. The (not very imaginatively named) Data Gathering Project is a project to support the decision making of the Journal Coordination Scheme in relation to subscription to, and use of, academic journal literature across Cambridge.

What we have initially found is that the data is remarkably difficult to put together. Cambridge University does not use bibliometrics as a means of measuring our researchers, so we do not subscribe to SciVal, but we have access to Scopus. But Scopus does not pick up Arts and Humanities publications particularly well, so it will always be a subset of the whole.

Some information that we thought would be helpful simply isn’t. We do have an institutional Altmetric account, so we were able to pull a report from Altmetric of every paper with a Cambridge author held in that database.  But Altmetric does not give a publisher view – we would have to extract this using doi prefixes or some other system. 

Cambridge uses Symplectic Elements to record publications from which, for very complicated reasons, we are unable to obtain a list of publishers with whom we publish. As part of the subscription we have access to the new analysing product, Dimensions. However, as far as we have managed to see, Dimensions does not break down by publisher (it works at the more granular level of journal), and seems to consider anything that is in the open domain (regardless of licence) to be ‘open access’. So figures generated here come with a heavy caveat.

We are also able to access the COUNTER usage statistics for our journals with the help of  the Library eresources team. However these include downloads for backfiles and for open access articles, so the numbers are slightly inflated, making a ‘cost per download’ analysis of value against subscription cost inaccurate.

We know how much we spend on subscriptions (spoiler alert: a lot). We need to take into consideration our offsetting arrangements with some publishers – something we are taking an active look at currently anyway.

Reaching out to the publishing community

So to supplement the aggregated information we have to hand, we have reached out to those publishers our researchers publish with in significant quantities to ask them for the following data on Cambridge authors: Peer Reviewing, Publishing, Citing, Editing, and Downloading.

This is exactly what the University of California is demanding. One of the reasons we need to ask publishers for peer review information is because it is basically hidden work. Aggregating systems like Publons do help a bit, although the Cambridge count of reviewers in the system is only 492 which is only a small percentage of the whole. Publons was bought out by Clarivate Analytics (which was Thompson Reuters before this and ISI before that) a year ago. We did approach Clarivate Analytics for some data about our peer reviewing, but declined to pay the eye watering quoted fee.

What have we received?

Contrary to our assumptions, many of the publishers responded saying that this information is difficult to compile because it is held on different systems and that multiple people would need to be contacted. Sometimes this is because publishers are responsible for the publication of learned society journals so information is not stored centrally.  They also fed back that much of the data is not readily available in a digestible format. 

Some publishers have responded with data on Cambridge peer reviewers and editors, usage statistics, and citation information. A big thank you to Emerald, SAGE, Wiley, the Royal Society and eLife. We are in active correspondence with Hindawi and PLOS. [STOP PRESS: SpringerNature provided their data 30 minutes after this blog went live, so thanks to them as well].

However, a number of publishers have not responded to our requests and one in particular would like to have a meeting with us before releasing any information.

Findings so far

The brief for the project was to ‘understand how our researchers interact with the literature’.  While we wrote the brief ourselves, we have come to realise it is actually very vague. We have tried to gather any data we can to start answering this question.

What the data we have so far is helping us understand is how much is being spent on APCs outside the central management of the Office of Scholarly Communication (OSC). The OSC manages the block grants from the RCUK (now UKRI) and the Charities Open Access Fund, but does not look after payments for open access for research funded by, say the Bill and Melinda Gates Foundation or the NIH. This means that there is a not insignificant amount of extra expenditure on top of that  coordinated by the OSC. These amounts are extremely difficult to ascertain as observed in 2014.

We already collect and report on how much the Office of Scholarly Communication has spent on APCs since 2013. However some prepayment deals makes the data difficult to analyse because of the way the information is presented to us. For example, Cambridge began using the Wiley Dashboard in the middle of the year with the first claim against it on 6 July 2016, so information after that date is fuzzy.

The other issue with comparing how much a publisher has received in APCs and how much the OSC has paid (to determine the difference) is dates. We have already talked at length about date problems in this space. But here the issue is publisher provided numbers are based on calendar years. Our reporting years differ – RCUK reports from April to March and COAF from October to September, so pulling this information together is difficult.

Our current approach to understanding the complete expenditure on APCs, apart from analysing the data being provided by (some) publishers, is to establish all of the suppliers to whom the OSC has paid an APC and obtain the supplier number. This list of supplier numbers can then be run against the whole University to identify payments outside the OSC.

This project is far from straightforward. Every dataset we have will require some enhancement. We have published a short sister post on what we have learned so far about organising data for analysis. But we are hoping over the next couple of months to start getting a much clearer idea of what Cambridge is contributing into the system – in terms of papers, peer review and editorial work in addition to our subscriptions and APCs. We need more evidence based decision making for negotiation.

Footnote

* There has been some discussion in listservs about who is behind the Call to Action and the Declaration. Thanks to Jeff MacKie-Mason, University Librarian and Professor, School of Information and Professor of Economics at UC Berkeley, we are happy to clarify:

  • The Declaration is by the faculty senate’s library committee – University Committee on Library and Scholarly Communication (UCOLASC)
  • The Call to Action is by the University of California’s Systemwide Library and Scholarly Information Advisory Committee, UCOLASC, and the UC Council of University Librarians, who: “seek to engage the entire UC academic community, and indeed all stakeholders in the scholarly communication enterprise, in this journey of transformation”.

Published 26 June 2018 (amended 27 June 2018)
Written by Dr Danny Kingsley & Katie Hughes
Creative Commons License

Reflections on Open Research – a PI’s perspective

As part of the Open Research Pilot Project, Marta Teperek met with Dr David Savage and asked him several questions about his own views and motivations for Open Research. This led to a very inspiring conversation and great reflections on Open Research from the Principal Investigator’s perspective. The main points that came out of the discussion were:

  • Lack of reproducibility raises questions about scientific rigour, integrity and relevance of work in general
  • Being open is to work in a team and be collaborative
  • Open Research will benefit science as a whole, and not the careers of individuals
  • Peer review remains a critical aspect of the scientific process
  • Nowadays, global collaboration and information exchange is possible, making the data really robust
  • Funders should emphasise the importance of research integrity and scientific rigour

This conversation is reported below in the original interview format.

Motivations for doing Open Research

Marta: To start, could you tell me why you are keen on Open Research and why did you decide to get involved in the Open Research Pilot Project?

David: Sure, but before we start I wanted to stress that when I make comments about science, these are very general comments and they don’t apply to anyone in particular.

So my general feeling is that I am very concerned and disappointed about the lack of research reproducibility in science. Lack of reproducibility raises questions about scientific rigour, integrity and relevance of work in general. Therefore, I am really keen on exploring ways of addressing these failings of science and I want to make a contribution to solving these problems. Additionally, I am aware that I am not perfect either and I want to learn how I can improve my own practice.

Were there any particular experiences which made you realise the importance of Open Research?

This is just the general experience of reading and also reviewing far too many papers where I thought that the quality of underlying data was poor, or authors were exaggerating their claims without supporting evidence. There is too much hype around, and the general awareness about the number of papers published in high impact journals which cannot be reproduced makes the move to more transparent and open approaches necessary.

Do we need additional rewards for working openly?

How do you think Open Research could benefit academic careers?

I am not sure if Open Research could or should benefit academic careers – this should not be the goal of Open Research. The goal is to improve the quality of science and therefore the benefit of science to the public. Open Research will benefit science as a whole, and not the careers of individuals. Science has become very egotistical and badge –accumulating. We should be investigating things which we find interesting. We should not be motivated by the prize. We should be motivated by the questions.

In science we have far too many people who behave like bankers. Publishing seems to be the currency for them and thus they are sloppy and lack the necessary rigour just because they want to publish as fast as they can.

In my opinion it is the responsibility of every researcher to the profession to try to produce data which is robust. It is fine to make honest mistakes. But it is not acceptable to be sloppy or fraudulent, or not to read enough literature. These are simply not good enough excuses. I’m not claiming to be perfect. But I want to constantly improve myself and my research practice.

Barriers to greater openness in research

What obstacles may be preventing researchers from making their research openly available?

The obvious one is competition for funding, which creates the need to publish in high impact factor journals and consequently leads to the fear of being scooped. And that’s a difficult one to work around. That’s the reason why I do not make everything we do in my research group openly available. However, looking at this from society’s perspective, everything should be made openly available, and as soon as possible for the sake of greater benefit to mankind. So balance needs to be found.

Do you think that some researchers might want to make their research open, but might not know how to do it, or might not have the appropriate skills to do it?

Definitely. Researchers need to know about the best ways of making their research open. I am currently trying to work out how to make my own project’s website more open and accessible to others and what are the best ways of achieving this. So yes, awareness of tools and awareness of resources available is necessary, as well as training about working reproducibly and openly. In my opinion, Cambridge has a responsibility to be transparent and open about its processes.

Role of peer-review in improving the quality of research

What frustrates you most about the current scholarly communication systems?

Some people get frustrated with the business model of some of the major publishers. I do not have a problem with it, although I do support the idea of pre-print services, such as bioRxiv. Some researchers get frustrated about long peer-review process. I am used to the fact that peer-review is long, and I accept it because I do not want fraudulent papers to be published. However, flawed peer review, such as biased peer-review or lack of rigorous peer review, is not acceptable and it is a problem.

So how to improve the peer-review process?

I think that peer-reviewers need to have greater awareness of the need for greater rigour. I was recently asked to peer review an article. The journal had dedicated guidance for peer reviewers. However, the guidance did not contain any information about suitability to undertake the peer-reviewing work. Peer-reviewer guidance documents need to address questions like: Do you really know what the paper is about? Do you know the discipline well enough? Are there any conflicts of interest? Would you have the time to properly peer-review the work? Peer-review needs to be done properly.

What do you think about the idea of journals employing professional peer-reviewers, who could be experts in their respective fields and could perform unbiased, high quality peer-review?

This sounds very reasonable, as long as professional peer-reviewers stay up to date with science. Though this would of course cost money!

I suppose publishers have enough money to pay for this. Have you heard of open peer-review and what do you think about it?

I think it is fine, but it might be subject to cronyism. I suspect that most people will be more likely to agree for their reviews to be made open as long as they make a recommendation for the paper to be accepted.

I recently reviewed a paper of a senior person and I rejected it. But if I made my review open, it would pose a risk to me – what if the author of the paper I rejected was the reviewer of my future grant application? Would they still assess my grant application objectively? What if people start reviewing each other’s papers and start treating peer-review as a mechanism to exchange favours?

The future of Open Research is in your hands

Who or what inspires you and makes you optimistic about the future of Open Research?

In Cambridge and at the Wellcome Trust there are many researchers who care about the quality of science. These researchers inspire me. These are very clever people, who work hard and make important discoveries.

I am also inspired by teamwork and collaboration. In Big Data and in human genetics in particular, people are working collectively. Human genetics and epidemiology are excellent examples of disciplines where 10-20 years ago studies were too small to allow researchers to make significant and reproducible conclusions. Nowadays, global collaboration and information exchange is possible, making the data really robust. As a result, human genetics is delivering really important observations.

To me, part of being open is to work in a team and be collaborative.

If you had a magic wand and if you could get one thing changed to get more people share and open up their research, what would it be?

Not sure… I suppose I am still looking for it! Maybe I will find one during the Open Research Pilot Project. Seriously speaking, I do not believe that a single thing could make a difference. It is the little things that matter. For example, on my side I am trying to make my own lab and institute more aware of reproducibility issues and ensure that I can make a difference in my own environment.

So as a Group Leader, how do you ensure that researchers in your own group are rigorous in their approach?

First, I really make them aware of the importance of reproducible research and of scientific rigour. I am also making a lot of effort to ensure that my colleagues are up to date with literature. I ask them if they read important literature and if they are unable to answer I ask them to do their homework. I am also imposing rigorous standards for experiments. In my lab people repeat the key experiments, or those which are particularly surprising, in a blind fashion. It takes a lot of time and extra resources, but it is important not to be too quick and to validate findings before making claims.

I am also ensuring that my people are motivated. For example, even though everyone helps each other in my group, all PhD students have direct access to me and we have regular discussions about their work. It is important that your group is of a manageable size; otherwise, as a group leader, you will not know all your people and you will not be able to have regular discussions about their work.

How do you identify people who care about reproducible research when making hiring decisions?

I ask all prospective applicants to make a short presentation about their previous work. During their presentation I ask them to tell me exactly what their research question was and how confident they were about their discovery. I am looking for evidence of rigorous methodology, but also for honesty and for people who are not overselling their findings.

In addition, I ask about their career goals. If they tell me that their career goal is to publish in Nature, or have two papers in Science, I count this against them. Instead, I favour applicants who are question-driven, who want to make progress in understanding how things work.

Role of funding bodies in promoting Open Research

Do you think that funders could play a role in promoting Open Research?

Funders could definitely contribute to this. The Wellcome Trust is a particularly notable example of a funding body keen on Open Research. The Trust is currently looking into the best ways to make Open Research the norm. Through various projects such as the Open Research Pilot, the Trust helps researchers like myself to learn best practice on reproducible research,and also to understand the benefits of sharing expertise to improve skills across the research community.

Do you think funder policies to mandate more openness could help?

Potentially. However, policies on Open Access to publications are easy to mandate and relatively easy to interpret and implement. It is much more difficult for Open Research. What does Open Research mean exactly? The right scope and definitions would be key. What should be made open? How? The Wellcome Trust is already doing a lot of work on making important research results available, and human genomic data in particular. But making your proteomic and genomic data publicly available is slightly different from ensuring that your experiments are rigorous and your results honest. So in my opinion, funders should emphasise the importance of research integrity and scientific rigour.

To close our discussion, what do you hope to achieve through your participation in the Open Research Pilot Project?

I want to improve my own lab’s transparency. I want to make sure that we are rigorous and that our research is reproducible. So I want to learn. At the same time I wish to contribute to increased research integrity in science overall.

Acknowledgements

Marta Teperek would like to thank SPARC EUROPE and Dr Joyce Heckman for interviewing her for the Open Data Champions programme – many of the questions asked by Marta in the interview with Dr David Savage originate from inspiring, open questions prepared by SPARC EUROPE.

Published 22 June 2017
Written by Dr Marta Teperek

Creative Commons License