All posts by Maria Angelaki

Cambridge Data Week 2020 day 1: Who are the winners and losers of good data practices?

Cambridge Data Week 2020 was an event run by the Office of Scholarly Communication at Cambridge University Libraries from 23–27 November 2020. In a series of talks, panel discussions and interactive Q&A sessions, researchers, funders, publishers and other stakeholders explored and debated different approaches to research data management. This blog is part of a series summarising each event.  

The rest of the blogs comprising this series are as follows:
Cambridge Data Week day 2 blog
Cambridge Data Week day 3 blog
Cambridge Data Week day 4 blog
Cambridge Data Week day 5 blog

Introduction

The first day of Cambridge Data Week 2020 kicked off with a tantalisingly open question: who are the winners and losers of good data practices? This question was addressed via two different perspectives: those of a funder, provided by Dr Georgie Humphreys (Wellcome), and of a publisher, provided by Dr Catriona MacCallum (Hindawi). Discussion of this topic during presentations and the Q&A session looked through various (but not mutually exclusive) lenses, including those of data sharing, quality, ethics, and research culture. Funder mandates for data sharing and what these have achieved (e.g. saving research funds related to data reuse) were reflected upon, as were disciplinary differences between STEMM, social sciences, arts and humanities. There was also a discussion of evidence relating to shifts in research culture and if this is pointing to better data practices. As a whole, the webinar explored a broader view of good data practices, the consequences of these, and the progress being made in embedding good data management in research. 

Topical for this year, both speakers discussed data sharing related to Covid-19 research. Catriona stated that Covid has exposed systemic flaws in the existing system (in relation to data sharing), and Georgie highlighted some surprising results regarding data availability statements in Covid-related articles. The CARE Principles for Indigenous Data Governance were also bought to the fore by Catriona, who argued for attention to be placed on potential power issues surrounding data sharing. These are a set of principles, complementary to the FAIR principles, but which encourage the open research movement to fully engage with Indigenous Peoples rights and interests. A pervasive undercurrent ran throughout the webinar – research culture and some problems therein. These were addressed explicitly by both speakers, with both stating that more needs to be done by institutions to implement DORA and reward researchers for their achievements and good research practices and not just according to where (i.e. in what journals) their research is published. Catriona highlighted results from a 2019 EUA report that shows that institutions have some way to go in this regard, that the value of data is not fully recognised, and that responsible research assessment is at the heart of cultural change in the right direction.

We had some great questions from the audience that were answered in the Q&A session, such as “In countries without the REF, is data sharing better?”, and “How do you get qualitative researchers on board with this?”, and “What is the role of universities in the so-called data-driven economy?”. Our audience also responded to the poll we held at the end of the webinar, where we asked participants to select one from seven given options that they regard as most likely to prevent good data practices among researchers. Resource indicators (knowledge, time, money for RDM) amounted to 46% of responses (blue in the chart below) and cultural indicators amounted to 53% (orange in the chart). Overall, the results were rather surprising but optimistic, revealing that a dominant perception among the participants is that a shift in cultural practices is one of the leading factors necessary to drive forward good data practices in research.

Graph showing the results of the poll held during the webinar, indicating what participants consider most likely to prevent or inhibit good data practices.
Figure 1. Results of the poll held during the webinar, where participants were asked to choose one of seven factors that they consider most likely to prevent or inhibit good data practices.

Audience composition

We had 274 registrations for this webinar, with just over 70% originating from the Higher Education sector. Researchers and PhD students accounted for 40% of registrations and research support staff for an additional 30%. On the day, we were thrilled to see that 164 people attended the webinar, participating from a wide range of countries.

Recording , transcript and presentations

The video recording of the webinar can be found below and the recording, transcript and presentations are present in Apollo, the University of Cambridge repository.

Bonus material

There were a few questions we did not have time to address during the live session, so we put them to the speakers afterwards. Here are their answers:

What are the ethics of using secondary data, particularly in relation to primary versus secondary researchers’ objectives, meaning of data/methods, consent of participants, and in the case of qualitative data, the personal relationships built between researcher and participants?

Georgie Humphreys This question seems to allude to informed consent which is still a topic of active discussion in terms of what one tries to build into the original informed consent to allow subsequent secondary use down the line. There is this idea of broad consent now where a participant would consent to that particular project but they’re also consenting to their data being kept and maybe reused for other purposes related to different scientific questions, but maybe with clauses such as ‘not for commercial benefits’. There are potential concerns about re-identification but there are mechanisms for dealing with that – mechanisms which reduce risk whilst retaining value, such as anonymisation or synthetic data creation. But there are other datasets where that’s just not going to be possible, where you lose all value of the original dataset. The UKDS have a nice page on informed consent, providing information on what you put in your consent forms to enable secondary use. This needs to be thought about at the very start of the study prior to collection of the primary data.

Catriona MacCallum This question is really focusing on data privacy issues. The primary researcher collects the data, the secondary researcher reuses the data. There are ways that researchers can be given access to the data while maintaining privacy. The primary researcher is creating the relationships with participants in order to obtain data, so what does this mean ethically for those wishing to reuse the data? Safety nets do need to be put into place. Here, it’s important to raise the CARE principles again. These were the result of a working group that came about as a result of concerns about how data from indigenous people are being treated. The slogan is now ‘Be FAIR and CARE’. The CARE principles are emerging in the UN’s agenda, and UNESCO, and I’m sure it will come up with the Research Council’s too.

What are the best practices to ensure data quality? 

Catriona MacCallum It depends what is meant by ‘quality’ as there are various ways of looking at this. The European Commission came up with the economic loss of not publishing failed experiments; in other words, the publication bias that results. We need to redefine what we mean by quality, integrity and again this speaks to the research culture as no one gets rewarded for publishing a failed result and in fact the researchers end up feeling embarrassed and tend not to do it. Publication bias is huge! It also applies to the humanities and social sciences as well but potentially in a different way, and there are huge biases in terms of what gets published and what is allowed to get published.

Georgie Humphreys This issue is probably a plug for the open peer review model where the filter is not at the beginning but later on. [In open peer review, authors and reviewers are aware of each other’s identity and encouraged to engage in open discussion. This makes the process more transparent, removing bias or conflicts of interest. Manuscripts are made publicly available pre-review, and reviews are published alongside the article].

Conclusion

So, who are the winners and losers of good data practices? Georgie believes that everyone, in the long term, will be a winner. If time is spent ensuring data is well-documented, well-organised, has dictionaries, is stored somewhere for the long term, then it will benefit the data creators just as much as anyone else. In the short term, she acknowledges that there may be people that find being a champion in this field a challenge for them individually, but it’s just about continuing along this journey to get to the point where everything is in place to truly reward and recognise those that have good open practices and good data management practices. Catriona says that there are so many winners: the economy, society, and science, the social sciences and humanities – all will benefit from data sharing. Taking society as an example, sharing data and sharing it well (through good research data management) will increase public trust in science, benefit public health and even help toward achieving multiple sustainable development goals.

Resources

A Covid-19 press release by Wellcome in January 2020 called on researchers, publishers and funders to share or facilitate the sharing of interim and final data as rapidly as possible. Wellcome have been exploring the impact of this statement on data sharing.

‘The FAIR Guiding Principles for Scientific Data Management and Stewardship’ by Wilkinson et al. in Scientific Data (March 2016).

CARE Principles of Indigenous Data Governance. The full CARE principles are outlined here.

UKDS information on informed consent, including a downloadable model consent form with suggested wording to allow secondary data reuse.

An April 2020 publication by Colavizza et al. on ‘The citation advantage of linking publications to research data’ showing that article citations are greater when they have data availability statements that include a link (e.g. DOI) to data archived in a repository.

A European University Association (EUA) report published in October 2019 by Saenen et al. on ‘Research assessment in the transition to Open Science: 2019 EUA Open Science and Access Survey Results’.

Published 25 January 2021

Written by Dr Sacha Jones with contributions from Dr Georgie Humphreys, Dr Catriona MacCallum and Maria Angelaki.  

CCBY icon

Cambridge Data Week 2020 day 2: Who is reusing data? Successes and future trends?

Cambridge Data Week 2020 was an event run by the Office of Scholarly Communication at Cambridge University Libraries from 23–27 November 2020. In a series of talks, panel discussions and interactive Q&A sessions, researchers, funders, publishers and other stakeholders explored and debated different approaches to research data management. This blog is part of a series summarising each event.  

The rest of the blogs comprising this series are as follows:
Cambridge Data Week day 1 blog
Cambridge Data Week day 3 blog
Cambridge Data Week day 4 blog
Cambridge Data Week day 5 blog

Introduction

Reuse of data is the final element of the FAIR principles and has long been argued as a central benefit of data sharing, allowing others access to a wealth of research and making research funding more efficient by removing the need to duplicate work. Yet we are still in the process of evaluating success in this area. This webinar brought together speakers to discuss what we know about the current state of play around data reuse, what researchers can do to increase the reuse potential of their data, and possible future developments in data reuse.

Our speakers – Louise Corti (UK Data Archive) and Tiberius Ignat (Scientific Knowledge Services) – looked at data reuse from two different perspectives. Louise focused on the reuse of UK Data Service collections, sharing some examples of their most widely used data sets, discussing what makes them popular and sharing some principles that can be used both to make data more reusable and to promote it for reuse. Tiberius discussed the prevalence of data reuse by machines and the possibility of granting machines data reuse rights.

Louise’s presentation gave an overview of the portfolio of data sets hosted by the UK Data Service, looked at their top 20 most downloaded datasets and discussed the underlying principles that have led to them being widely reused. As well as demonstrating some commonalities between these datasets, Louise also outlined the principles used by the UK Data Service to promote their collections for reuse.

Tiberius’ presentation looked at data reuse from a different perspective, serving as a call to action to share research data responsibly and protect it against the reuse of machines designed to persuade humans. One of Tiberius’ main arguments was that no research data from public projects should be made available to feed and develop persuasive algorithms.

The presentations motivated an interesting discussion covering a broad range of topics. These included the reuse of qualitative data, how we can implement ethical safeguards data reuse, the idea of data ethics as a continuum, whether we can accept positive cases of algorithmic persuasion such as to promote equality and diversity, and the possibility of creating specific licences prohibiting data reuse by persuasive algorithms. See below for a video and transcript of the session.

Audience composition

We had 341 registrations with just over 65% originating from the Higher Education sector. Researchers and PhD students accounted for nearly 37% of the registrations whilst research support staff accounted for an additional 33%. We also had registrations from at least 30 countries outside of the UK including significant attendance from Denmark, Holland, Germany and Canada. We were thrilled to see that on the actual day 187 people attended the webinar.

We held five online webinars during Cambridge Data Week and were pleased to see that nearly 25% of the participants attended more than one webinar. A total of 1364 people registered and more than 700 attended all together, with the rest possibly watching the recordings at a later date. Most of all we were pleased to welcome participants from all over the world and see how important research data management topics are globally.

Where data was available, we identified the following countries apart from the UK:  Australia, Austria, Bangladesh, Brazil, Canada, Colombia, Croatia, Czech Republic, Denmark, France, Germany, Greece, Holland, Hungary, Iran, Luxembourg, Moldova, Norway, Poland, Romania, Singapore, Spain, Sweden, Switzerland, Turkey, Ukraine and the USA.

Recording , transcript and presentations

The video recording of the webinar can be found below and the recording, transcript and presentations are present in Apollo, the University of Cambridge repository

Bonus material

After the session ended, we continued the discussion with Louise and Tiberius looking in particular at one question posed by an audience member:

AI can always be used either for good or bad. Instead of locking-in, how can we enhance technology through data and regulation? 

Tiberius Ignat I think at this point we need regulation. I’m not a big fan of using regulations, to be honest. I think it’s much better to motivate people but, in this case, it’s quite a bit of control that has been lost, so I think we should have a regulation on how research data can be reused by others. This is how the internet has been made profitable during the last decade — through non-human persuasion. All these companies that are giving so much away for free are making billions of dollars when you look at the stock market. We were not clear how they were making this profit until recently when we realised that they are doing it by changing our behaviour and I think the rest of society – including research organisations – are behind them, so we need some regulation.

A good example is with GDPR. It has been introduced to protect our data, our digital footprint. On ResearchGate or Eurosport, or any other website, we used to be asked to agree to cookies or not. Recently, a new option called “Legitimate interest” has been slipped in and our digital data is again collected – less noticeably – by invoking questionable legitimate rights. The organisations whose model is based on persuading need cookie data, so they have moved the discussion away from remaining GDPR compliant to defending their legitimate interests. They are fighting to take data away from us. We can tackle this with regulation faster but in the long term we need to educate people to be more aware. We do have licenses such as Creative Commons but I’m not sure we have the right ones to protect us.

Louise Corti There are a variety of licenses, but they are abused and it’s very hard to track along the way what has gone wrong. I quite like the UK Government’s approach with some of their statistical data that has to go through a legal gateway. Some data can be made available for research, but it has to be done for the public good. We also have the Ethics Self-Assessment Tool, which is a grid you go through provided by the Statistics Authority and it asks you to think along lots of different dimensions of ethics. This helps researchers get a better sense of what they are trying to do, but whether the people we are talking about would care about it is a very different matter. Having been in research ethics for a very long time, that is by far the best tool I’ve seen and I recommend everyone uses it. The UK Data Archive uses it to evaluate some of the projects we deal with because you find often university ethics approvals are not good enough for the Statistics Authority because often they don’t understand quantitative secondary analysis, so the ethics scrutiny is not good enough. Self-Assessment is a much more nuanced thinking about the different dimensions of ethics and it helps researchers to be a bit more reflective about what’s good and what’s not.

Conclusion

Overall, the session provided a compelling blend of both the practical and conceptual elements of data reuse, each raising questions which could have easily been entire sessions in themselves. Louise’s presentation gave an excellent overview of the UK Data Service’s approach to making their datasets more reusable and promoting them to maximise their chances of being reused. Tiberius’ session raised some interesting questions surrounding data reuse and the ethics of using algorithms to persuade humans, as well as looking at some practical options for protecting research data from reuse for nefarious ends. At the end of the session, the audience were asked to participate in a poll on “What future developments are needed to increase the prevalence of data reuse?”.

Audience responses to poll held at the end of the event

The results were unsurprising to either speaker, with each touching on the idea that a change in research culture is necessary to ensure data reuse projects are seen as equal to data-generating projects. The need for cultural change is a theme that ran throughout each of the sessions in Data Week and is perhaps one of the current major challenges in scholarly communication.

Resources

Data Access and Research Transparency (DA-RT): A Joint Statement by Political Science Journal Editors

Robots appear more persuasive when pretending to be human

Behavioural evidence for a transparency–efficiency tradeoff in human–machine cooperation

The next-generation bots interfering with the US election

IBM’s AI Machine Makes A Convincing Case That It’s Mastering The Human Art Of Persuasion

AI Learns the Art of Debate

CSI-COP

Published on 25 January 2021

Written by Dominic Dixon

CCBY icon

Cambridge Data Week 2020 day 3: Is data management just a footnote to reproducibility?

Cambridge Data Week 2020 was an event run by the Office of Scholarly Communication at Cambridge University Libraries from 23–27 November 2020. In a series of talks, panel discussions and interactive Q&A sessions, researchers, funders, publishers and other stakeholders explored and debated different approaches to research data management. This blog is part of a series summarising each event:

The rest of the blogs comprising this series are as follows:
Cambridge Data Week day 1 blog
Cambridge Data Week day 2 blog
Cambridge Data Week day 4 blog
Cambridge Data Week day 5 blog

Introduction

The third day of Cambridge Data Week consisted of a panel discussion about the relationship between reproducibility and Research Data Management (RDM), looking for ways to advocate effectively to reach positive outcomes in both areas. Alexia Cardona (University of Cambridge), Lennart Stoy (European University Association), Florian Markowetz (University of Cambridge & UK Reproducibility Network), and René Schneider (Geneva School of Business Administration) offered their perspectives on whether RDM really is just a ‘footnote’ to the more popular concept of reproducibility.

The speakers agreed that we are still in need of cultural change towards better data management and reproducibility. The word ‘reproducibility’ is more likely to excite researchers and it is important to craft messages that work for each group, hence the emphasis on this term. In contrast to the Cambridge Data Week event on data peer review, the discussion here focused on engaging senior researchers, from PIs to Heads of Institutions, motivating them to be not just good data managers, but great data leaders.

Among the key elements needed to drive best practice in this area, two stood out. The first is communities. Whether these are reproducibility circles of peers, or networks like the Cambridge Data Champions, communities are key to creating and implementing guidelines for data management. The second element is a solid technological infrastructure. For instance, block chains could be used to enable reproducibility in citations in the humanities, or Persistent Identifiers, used at a very granular level, could lead to better data reuse.

Recording , transcript and presentations

The video recording of the webinar can be found below and the recording, transcript and presentations are present in Apollo, the University of Cambridge repository.

Bonus material

There were a few questions we did not have time to address during the live session, so we put them to the speakers afterwards. Here are their answers:

What are good practices regarding data deletion?

Florian Markowetz It very much depends on what kind of data you have, it’s hard to give general directions. However, drives and other hardware are becoming cheaper and cheaper, so I would say ‘save everything’.

René Schneider I would agree. I have spoken to researchers who keep all their data, because it would create too much work to sort what to keep and what to delete.

Alexia Cardona We tend to talk more about data archiving than data deletion. I often hear about data deletion where it has created problems, for example an account has been deleted in bulk when a researcher left an institution, so unpublished data and scripts are lost due to lack of communication. There are also cases on the internet of PhD students losing all their thesis when the laptop crashed, so this issue goes hand in hand with data storage and backup. Let’s focus on good practices and archiving of data, deletion is the very last thing to worry about.

Lennart Stoy It’s worth mentioning that there is often a compulsory period that data should be kept for, perhaps 3 years or 5 years according to funders mandates, so data should be stored for some time. I suppose the expense could become an issue in the coming years, some Universities are already concerned about the cost of having to buy large amounts of cloud storage space. There are also discussions in the Open Science Could teams about what to preserve in the long term. We want to make sure we preserve the higher value datasets, but of course it’s hard to define which ones those are.

Couldn’t scholarly communities of practice or learned societies create guidelines for reproducibility and good data management?

Lennart Stoy Absolutely, they must be involved as they are the ones with the specific knowledge. This is the idea behind Research Data Alliance (RDA) and the National Research Data Infrastructure (NFDI) in Germany. In those cases, you have to prove a link to the community in that field to establish a consortium. It is great when communities structure their areas of infrastructure from the bottom up.

What roles could Early Career Researchers (ECRs) have? Could they act as code-checkers to assist reproducibility, or are we asking too much of them given their busy schedules? Would they receive credit for this?

Florian Markowetz Senior academics have no excuses for not getting more involved in this once they have stable positions. It’s easy for people in my position to point to students, or to funders, saying they are not doing enough, but we should not be pointing away from ourselves, we should do the work. It could be coupled to pay rises: if you hold any role above grade 12 it’s your job now to sort this all out.

René Schneider I have been thinking about the role of data custodians or similar. If we ask researchers to spend a lot of time just checking data, like ‘warehouse workers’, we could be undervaluing their role. I don’t think it’s necessarily the researchers who should do the work, especially not ECRs, there should be other roles dedicated to this.

Alexia Cardona I second that, researchers are supposed to focus on the research, not necessarily the data checking and curation. But the unfortunate truth is that with short contracts and lack of resources the work is left to them. Another problem is the lack of rewards. For instance in my area, training, there’s no reward for people who take the time to make their training FAIR. We should embrace more openness and fairness, including rewarding those who do the work.

Lennart Stoy This is something we’ve been working on but it’s a challenging system to change because there are so many elements to disentangle. It relates to intense competition for jobs, the culture in different disciplines, and the pressure to publish in certain journals. Some Universities are very serious about implementing DORA and I hope that in a few years these will be able to show high levels of satisfaction among PhD students and ECRs. A lot depends on the leadership at the institutional level to initiate change, for instance the rector at Ghent University in Belgium has been driving DORA-inspired reward mechanisms and the Netherlands is also moving ahead and moving away from journal-based factors. The University of Bath is an example in the UK that I’ve heard mentioned a lot. We’re following progress in all these examples and will write up DORA good practice case studies to inspire other organisations. But it is a hard problem, ECRs have a lot on the line, it’s important not to jeopardise their careers.

Conclusion

This compelling discussion left us feeling that it does not matter too much which words we emphasise: reproducibility, data management, data leadership, or something else entirely. What matters is that we spark interest and commitment in the right groups of researchers to drive progress. Creating a culture where great research practices are routine will take effective advocacy, but also rewards that align with our aims and the right technical infrastructure to underpin them.

Resources

UK data service is a data repository funded by the Economic and Social Research Council (ESRC), which also provides extensive resources on data practices.

The journal PLOS Computational Biology introduced a pilot in 2019 where all papers are checked for the reproducibility of models.

Is there a reproducibility crisis? Baker’s 2016 paper in Nature reporting the results of a survey that exposed the extent of the reproducibility crisis.

San Francisco Declaration on Research Assessment (DORA), a set of recommendations for institutions, funders, publishers, metrics companies and researchers, aiming for a fairer and more varied system of research quality assessment.

Published on 25 January 2021

Written by Beatrice Gini

CCBY icon

Cambridge Data Week 2020 day 4: Supporting researchers on data management – do we need a fairy godmother?

Cambridge Data Week 2020 was an event run by the Office of Scholarly Communication at Cambridge University Libraries from 23–27 November 2020. In a series of talks, panel discussions and interactive Q&A sessions, researchers, funders, publishers and other stakeholders explored and debated different approaches to research data management. This blog is part of a series summarising each event: 

The rest of the blogs comprising this series are as follows:
Cambridge Data Week day 1 blog
Cambridge Data Week day 2 blog
Cambridge Data Week day 3 blog
Cambridge Data Week day 5 blog

Introduction 

How should researchers’ data management activities and skills be supported? What are the data management responsibilities of the funder, the institution, the research group and the individual researcher? Should we focus on training researchers so they can carry out good data management themselves or should we be funding specialist teams who can work with research groups, allowing the researchers to concentrate on research instead of data management? These were the questions addressed on day 4 of Cambridge Data Week 2020. This session benefitted from the perspectives of three speakers deriving from three different components of the research ecosystem: national funder, institutional research support and department/institute. Respectively, these were provided by Tao-Tao Chang (Arts and Humanities Research Council [AHRC]), Marta Teperek (TU Delft) and Alastair Downie (The Gurdon Institute, Cambridge). 

From a funder’s perspective, and following UKRI community consultation, Tao-Tao specifies that digital research infrastructure is recognised as an area for urgent investment, particularly in the arts and humanities, where both software and data loss are acute. Going forwards, AHRC’s key priorities will be to prevent further data loss, invest in skills, build capability, and work with the community to effect a sustained change in research culture. At an institutional level, Marta argues that it is unfair for researchers to be left unsupported to manage their data. The TU Delft model addresses this via three methods: central data support, disciplinary support by data stewards as permanent faculty staff, and hands-on support for research groups via data managers and research software engineers. Regarding the latter, an important take-home message for all researchers, regardless of institutional affiliation, is to build data management costs into grant proposals. Alastair takes up the discussion at the level of the department, research group and even individual, highlighting how researchers are locked into infrastructure silos, and locked into an unhelpful, competitive culture where altruism is a risky proposition and the career benefits of sharing seem intangible or insufficient. Alastair proposes that the climate is right and the community is ready for change, and goes on to discuss some positive changes afoot in the School of Biological Sciences to counteract these.  

Audience composition  

We had 291 registrations for the webinar, with just over 70% originating from the Higher Education sector. Researchers and PhD students accounted for 30% of the registrations whilst research support staff from various organisations accounted for an impressive 46%. On the day, we were thrilled to see that 136 people attended the webinar, participating from a wide range of countries. 

Recording, transcript and presentations 

The video recording of the webinar can be found below and the recording, transcript and presentations are present in Apollo, the University of Cambridge repository.

Bonus material 

There were a few questions we did not have time to address during the live session, so we put them to the speakers afterwards. Here are their answers: 

Talking about the technical side have you yet come across anyone using a machine implementable DMP? Setting up a data management infrastructure for a large project it’s become apparent that checking compliance with a DMP is a huge job and of course there is minimal resource for doing this.

Marta Teperek Work is being done in this area by Research Data Alliance where there are several groups working on machine actionable DMPs. Basically, the idea is that instead of asking researchers to write long essays about how they are planning to manage their data, they are asked to provide answers that are structured. These can be multiple choice options, for example, where the researcher specifies that they will be depositing large amounts of data in the repository and the repository will be notified of data coming their way. In other words, actions are made depending on what the researcher says they will do. University of Queensland is doing a lot on this already [see link to blog post here and in Resources further below].

What are the best cross-platform, mobile and desktop tools for data management?

Alastair Downie RDM encompasses a far too broad a range of activities – it’s a concept rather than a single activity that you can build into a neat little app. In the context of electronic lab notebooks, for example, there are hundreds of apps that serve that function and some of them cross over into lab management as well. Those products that try to do too much become very bloated and complex, which makes them unattractive and so we don’t see uptake of those kind of products. I think a suite approach is better than a single solution.

Institutions audit spending on research grants, they should do the same for research data and should be a requirement of holding a grant.

Alastair Downie Wellcome Trust are now challenging researchers to demonstrate that they have complied with their DMPs. It’s not particularly empirical but the fact that they are demonstrating their determination to make sure that everyone’s doing things properly is very helpful. 

Are there any specific infrastructure projects that the AHRC is sponsoring? I’m curious about what infrastructure/services would be useful for Arts and Humanities researchers

Tao-Tao Chang Not at this juncture. But we are hoping that this will change. AHRC recognises the importance of good data management practice and the need to support it. We also recognise that there is a skills gap and that all researchers at every level need support.

Is there a 2020 edition of the State of Open Data report?

Yes, this was published five days after this webinar! See the Digital Science website and further below under ‘Resources’.

Conclusion 

There are two outcomes of the webinar to draw upon here. The first raises again the question: do researchers need, or even want, a fairy godmother to support their research data management?  We held a poll at the end of the webinar, asking participants to choose which one of the following statements they believe most strongly: (1) ‘Individual researchers should learn how to manage their own data well’ or (2) ‘Researchers’ data should be managed by funded RDM specialists so that researchers can focus on research’. Of the 78 respondents, 67% chose the first option and 33% chose the second. There was not an intermediate option to incorporate both, simply because we wanted to force a choice in the direction of strongest belief when the two options are considered relative to one another. 

The results of the poll and the discussions during the webinar (between the speakers and within the chat) indicate that while individual researchers are responsible for managing their research data, support does need to be made available and promoted actively (we provide in the ‘Resources’ section some links to University of Cambridge research data management support). A second outcome reveals that support needs to be provided under several different guises. On the one hand, there is support that comes via the provision of funding, research data services and individually tailored expertise. Yet, on the other hand, there is support that will derive, albeit in a less tangible sense, from positive changes in research culture, specifically in terms of how the research of individual researchers is assessed and rewarded.  

Resources  

Some links to University of Cambridge research data management support include: the Research Data Management Policy Framework that outlines, for example, the data management responsibilities of research students and staff; our data management guide; a list of Cambridge Data Champions, searchable by areas of expertise. 

A recent Postdoc Academy podcast on ‘How can we improve the research culture at Cambridge?’ 

description of different data management support roles at TU Delft, by Alastair Dunning and Marta Teperek: data steward, data manager, research software engineer, data scientist and data champion.  

A Gurdon Computing blog post by Alastair Downie on ‘Research data management as a national service’; in other words, rather than duplicating infrastructure and services across the research landscape. 

An article by Florian Markowetz, discussed in the webinar, on ‘Five selfish reasons to work reproducibly’ (in Genome Biology)

TU Delft Open Working blog post by Marta Teperek on machine actionable Data Management Plans (DMPs) at the University of Queensland. For more information, see this article by Miksa and colleagues on the ‘Ten principles for machine-actionable data management plans’ (in PLOS Computational Biology).  

The State of Open Data 2020 report, published on 1 December 2020. 

Published on 25 January 2021

Written by Dr Sacha Jones with contributions from Tao-Tao Chang, Dr Marta Teperek, Alastair Downie and Maria Angelaki. 

CCBY icon

Cambridge Data Week 2020 day 5: How do we peer review data? New sustainable and effective models

Cambridge Data Week 2020 was an event run by the Office of Scholarly Communication at Cambridge University Libraries from 23–27 November 2020. In a series of talks, panel discussions and interactive Q&A sessions, researchers, funders, publishers and other stakeholders explored and debated different approaches to research data management. This blog is part of a series summarising each event:   

The rest of the blogs comprising this series are as follows:
Cambridge Data Week day 1 blog
Cambridge Data Week day 2 blog
Cambridge Data Weekday 3 blog
Cambridge Data Week day 4 blog

Introduction  

Cambridge Data Week 2020 concluded on 27 November with a discussion between Dr Lauren Cadwallader (PLOS), Professor Stephen Eglen (University of Cambridge) and Kiera McNeice (Cambridge University Press) on models of data peer review. The peer review process around data is still emerging despite the increase in data sharing. This session explored how peer review of data could be approached from both a publishing and a research perspective. 

The discussion focused on three main questions and here are a few snippets of what was said. If you’d like to explore the speakers’ answers in full, see the recording and transcript below.  

Why is it important to peer review datasets?

Are we in a post-truth world where claims can be made without needing to back them up? What if data could replace articles as the main output of research? What key criteria should peer review adopt?

Word cloud created by the audience in response to “Why is it important to peer review datasets?” The four most prominent words are: integrity, quality, trust, reproducibility.
Figure 1: Word cloud created by the audience in response to “Why is it important to peer review datasets?”

How should data review be done?

Can we drive the spread of Open Data by initially setting an incredibly low bar, encouraging everyone to share data even in its messy state? Are we reviewing to ensure reusability, or do we want to go further and check quality and reproducibility? Is data review a one-off event, or a continuous process involving everyone who reuses the data?

Are journals exclusively responsible for data review, or should authors, repository managers and other organisations be involved? Where will the money come from? What’s in it for researchers who volunteer as data reviewers? How do we introduce the peer review of data in a fair and equitable way? 

Who should be doing the work?

Are journals exclusively responsible for data review, or should authors, repository managers and other organisations be involved? Where will the money come from? What’s in it for researchers who volunteer as data reviewers? How do we introduce the peer review of data in a fair and equitable way?

Watch the session 

The video recording of the webinar can be found below and the transcript is present in Apollo, the University of Cambridge repository

Bonus material 

After the end of the session, Lauren, Kiera and Stephen continued the discussion, prompted by a question from the audience about whether there should be some form of template or checklist for peer reviewing code. Here is what they said. 

Lauren Cadwallader  That’s an interesting idea, though of course code is written for different reasons, software, analysis, figures, and so on. Inevitably there will be different ways of reviewing it. Stephen can you tell us more about your experience with CODECHECK? 

Stephen Eglen At CODECHECK we have a process to help codecheckers run research code and award a “certificate of executable computation”, like this example of a report. If doing nothing else, then copying whatever files you’ve got onto some repository, dirty and unstructured as that might seem is still gold dust to the next researcher that comes along. Initially we can set the standards low, and from there we can come up with a whole range of more advanced quality checks. One question is ‘what are researchers willing to accept?’ I know of a couple of pilots that tried requiring more work from researchers in preparing and checking their files and code, such as the Code Ocean pilot that Kiera mentioned. I think that we have a community that understand the importance of this and is willing to put in some effort.  

Kiera McNeice There’s value in having checklists that are not extremely specialised, but tailored somewhat towards different subject areas. For instance, the American Journal of Political Science has two separate checklists, one for quantitative data and one for qualitative data. Certainly, some of our HSS editors have been saying that some policies developed for quantitative data do not work for their authors.  

Lauren Cadwallader  It might be easy to start with places where there are communities that are already engaged and have a framework for data sharing, so the peer review system would check that. What do you think? 

Kiera McNeice I guess there is a ‘chicken and egg’ issue: does this have to be driven from the top down, from publishers and funders, or does it come from the bottom up, with research communities initiating it? As journals, there is a concern that if we try to enforce very strict standards, then people will take their publications elsewhere. If there is no desire from the community for these changes, publisher enforcement can only go so far.  

Stephen Eglen Funders have an important role to play too. If they lead on this, researchers will follow because ultimately researchers are focused on their career. Unless there is recognition that there doing this as a valuable part of one’s work, it will be hard to convince the majority of researchers to spend time on it.  

Take a pilot I was involved in with Nature Neuroscience. Originally this was meant to be a mandatory peer review of code after acceptance in principle, but in the end fears about driving away authors meant it was only made optional. Throughout a six-month trial, I was only aware of two papers that went through code review. I can see the barriers for both journal and authors, but if researchers received credit for doing it, this sort of thing will come from the bottom up. 

Lauren Cadwallader  In our biology-based model review pilot we ran a survey and found that many people opted in because they believe in open science, reproducibility, and so on, but two people opted in because they feared PLOS would think they had something to hide if they didn’t. That’s not at all what it was about. Although I suppose if it gets people sharing data… 

Conclusion 

We were intrigued by many of the ideas put forward by the speakers, particularly the areas of tension that will need to be resolved. For instance, as we try to move from a world where most data remains in people’s laptops and drawers to a FAIR data world, even sharing simple, messy, unstructured data is ‘gold dust’. Yet ultimately, we want data to be shared with extensive metadata and in an easily accessible form. What should the initial standards be, and how should they be raised over time? And how about the idea of asking Early Career Researchers to take on reviewer roles? Certainly they (and their research communities) would benefit in many ways from such involvement, but will they be able to fit this in their packed schedules?  

The audience engaged in lively discussion throughout the session, especially around the use of repositories, the need for training, and disciplinary differences. At the end of the session, they surprised us all with their responses to our poll: “Which peer review model would work best for data?”. The most common response was ‘Incorporate it into the existing review of the article”, an option that had hardly been mentioned in the session. Perhaps we’ll need another webinar exploring this avenue next year! 

Poll graph showing the audience's response to the question "“Which peer review model would work best for data?”
Figure 2: Audience responses to a poll held at the end of the event 

Resources 

Alexandra Freeman’s Octopus project aims to change the way we report research. Read the Octopus blog and an interview with Alex to find out more.  

Publish your computer code: it is good enough, a column by Nick Barnes in Nature in 2010 arguing that sharing code, whatever the quality, is more helpful than keeping it in a drawer.  

The Center for Reproducible Biomedical Modelling has been working with PLOS on a pilot about reviewing models.  

PLOS guidelines on peer-reviewing data were produced in collaboration with the Cambridge Data Champions 

CODECHECK, led by Stephen Eglen, runs code to offer a “certificate of reproducible computation” to document that core research outputs could be recreated outside of the authors’ lab. 

Code Ocean is a platform for computational research that creates web-based capsules to help enable reproducibility.  

Editorial on pilot for peer reviewing biology based models in PLOS Computational Biology 

Published on 25 January 2021

Written by Beatrice Gini

CCBY icon

The Role of Open Data in Science Communication

Itamar Shatz has written a guest blog post for the Office of Scholarly Communication about how public trust in the scientific community increases when researchers make their data openly available to all. He also emphasizes that science communicators (e.g. press offices, journalists, publishers) have a responsibility to point attention directly at the primary source of the data. Itamar is a PhD candidate in the Department of Theoretical and Applied Linguistics at the University of Cambridge. He is also a member of the Cambridge Data Champion programme, having joined at the start of this year. He writes about science and philosophy that have practical applications at Effectiviology.com.

It’s no secret that the public’s view of the scientific community is far from ideal.

For example, a global survey published by the Wellcome Trust in 2019 showed that, on average, only 18% of people indicate that they have a high level of trust in scientists. Furthermore, the survey showed that there are stark differences between people living in different areas of the world; for instance, this rate was more than twice as high in Northern Europe (33%) and Central Asia (32%) than in Eastern Europe (15%), South America (13%), and Central Africa (12%).

Things do appear to be improving, to some degree, especially in light of the recent pandemic. For example, a recent survey in the UK, conducted by the Open Knowledge Foundation, has found that, following the COVID-19 pandemic, 64% of people are now “more likely to listen expert advice from qualified scientists and researchers”. Similar increases in public confidence have been found in other countries, such as Germany and the USA. However, despite these recent increases, there is still much room for improvement.

Open data can help increase the public’s confidence in scientists

The public’s lack of confidence in scientists is a complex, multifaceted issue, that is unlikely to be resolved by a single, neat solution. Nevertheless, one thing that can help alleviate this issue to some degree is open data, which is the practice of making data from scientific studies publicly accessible.

Research on the topic shows just how powerful this tool can be. For example, the recent survey by the Open Knowledge Foundation, conducted in the UK in response to the COVID-19 pandemic, found that 97% of those polled believed that it’s important for COVID-19 data to be openly available for people to check, and 67% believed that all COVID-19 related research and data should be openly available for anyone to use freely. Similarly, a 2019 US survey conducted before the pandemic found that 57% of Americans say that they trust the outcomes of scientific studies more if the data from the studies is openly available to the public.

Overall, such surveys strongly suggest that open data can help increase the public’s trust in scientists. However, it’s not enough for studies to just have open data for it to increase the public’s trust; if people don’t know about the open data, or if don’t fully understand what it means, then open data is unlikely to be as beneficial as it could be. As such, in the following section we will see some guidelines on how to properly incorporate open data into science communication, in order to utilize this tool as effectively as possible.

How to incorporate open data into science communication

To properly incorporate open data into science communication, there are several key things that people who engage in science communication—such as journalists and scientists—should generally do:

  • Say that the study has open data. That is, you should explicitly mention that the researchers have made the data from their research openly available. Do not assume that people will go to the original study and then learn there about the data being open.
  • Explain what open data is. That is, you should briefly explain what it means for the data to be openly available, and potentially also mention the benefits of making the data available, for example in terms of making research more transparent, and in terms of helping other researchers reproduce the results.
  • Describe what sort of data has been made openly available. For example, you can include descriptions of the type of data involved (surveys, clinical reports, brain scans, etc.), together with some concrete examples that help the audience understand the data.
  • Explain where the data can be found. For example, this can be in the article’s “supplementary information” section, though data should preferably be available in a repository where the dataset has its own persistent identifier, such as a DOI. This ensures that the audience can find and access the data, which may otherwise be hidden behind a paywall, and offers other benefits, such as allowing researchers to directly access and cite the dataset, without navigating through the article.

These practices can help people better understand the concept of open data, particularly as it pertains to the study in question, and can help increase their trust in the openness of the data, especially if it is placed somewhere that they can access themselves.

For one example of how open data might be communicated effectively in a press release, consider the following:

“The researchers have made all the data from this study openly available; this means that all the results from their experiments can be freely accessed by anyone through a repository available at: https://www.doi.org/10.xxxxx/xxxxxxx. This can help other scientists verify and reproduce their results, and will aid future research on the topic.”

Open data in different types of scientific communications

It’s important to note that there’s no single right way to incorporate open data into scientific communications. This can be attributed to various factors, such as:

  • Differences between fields (e.g. biology, economics, or psychology)
  • Differences between types of studies (e.g. computational or experimental)
  • Differences between media (e.g. press release or social media post).

Nevertheless, the guidelines outlined earlier can be beneficial as initial considerations to take into account when deciding how to incorporate open data into science communication. It is up to communicators to make the final modifications, in order to use open data as effectively as possible in their particular situation.

Summarizing what we’ve learned

Though the public’s trust in science is currently growing, there is much room for improvement. One powerful tool that can aid the academic community is open data—the practice of making data from research studies openly available. However, to benefit as much as possible from the presence of open data, it’s not sufficient for a study to merely make its data open. Rather, the accessibility of the data needs to be promoted and explained in scientific communication, and the dataset needs to be cited appropriately (see the Joint Declaration of Data Citation Principles for guidelines regarding this latter point).

What is currently being done

It is important to note that much work is already being done to promote the concept of open data. For example, organizations such as the Research Data Alliance promote discussion of the topic and publish relevant material, as in the case of their recent guidelines and recommendations regarding COVID-19 data.

In addition, at the University of Cambridge, in particular, we can already see a substantial push for open data practices, where appropriate, and from many angles as outlined in the University’s Open Research position statement. Many funding bodies mandate that data be made available, and the University facilitates the process of sharing the data via Apollo, the institutional repository. Furthermore, there are the various training courses and publications—including this very blog—led by bodies such as the Office of Scholarly Communication (OSC), which help to promote Open Research practices at the University. Most notably, there is the OSC’s Data Champion programme, which deals, among other things, with supporting researchers with open data practices.

Moving forward

Promoting the use of open data in scientific communication is something that different stakeholders can do in different ways.

For example, those engaging in science communication—such as journalists and universities’ communication offices—can mention and explain open data when covering studies. Similarly, scientists can ask relevant communicators to cite their open data, and can also mention this information themselves when they engage in science communication directly. In addition, consumers of scientific communication and other relevant stakeholders—such as the general public, politicians, regulators, and funding bodies—can ask, whenever they hear about new research findings, whether the data was made openly available, and if not, then why.

Overall, such actions will lead to increased and more effective use of open data over time, which will help increase the trust people have in scientists. Furthermore, this will help promote the adoption of open data practices in the scientific community, by making more scientists aware of the concept, and by increasing their incentives for engaging in it.

Published 19 June 2020

Written by Itamar Shatz

CCBY icon

2019 That Was The Year That Was 

This is our traditional yearly blog about what we have been doing at the OSC in Cambridge. We are publishing it a little later than intended, but this is an indication of how busy the beginning of 2020 has been here in the Office of Scholarly Communication.

2019 saw us more in a ‘business as usual’ phase as we knuckled down and got on with supporting researchers in Cambridge. That aside, we still had some major developments in Open Research and this work will continue into 2020 and beyond.  

Policy changes 

2019 saw a number of happenings in the policy space at Cambridge. Most excitingly, the University’s Position Statement on Open Research was announced in February, making it one of the first UK universities to have such a statement. This demonstrates the University’s commitment to making open research a reality at Cambridge. 

Following on from this, in July 2019, the University together with Cambridge University Press  announced that they have signed up to the San Francisco Declaration on Research Assessment (DORA). The newly created Open Research Steering Committee, headed by the University’s Pro-Vice Chancellor for Research, will have oversight over the open research direction and the implementation of DORA. The Steering Committee and their working groups are currently looking into open research training, open research infrastructure (such as electronic research notebooks), Plan S and DORA. 

In December, an updated version of the Research Data Management Policy Framework was released. This update brings the policy framework in alignment with funder requirements and acknowledges the important roles that Principal Investigators, research staff and students, and University support staff play in good data management practices. It sits beneath the Position Statement on Open Research, with the documents being closely aligned. 

Open access news 

The Open Access Service made great strides towards automating many of its processes this year, headlined by the introduction of Orpheus and Fast Track. Orpheus is a custom database of publisher open access policies, and when combined with Fast Track for manuscript processing, it allows the Open Access Service to reduce the number of steps required to archive a manuscript in Apollo. In 2019, 8325 manuscript submissions were processed through Fast Track. In total, the Open Access Service responded to 13,609 submissions or enquiries in 2019, equal to 37 requests per day. 

Our Request a Copy service received 7,626 requests in 2019. One of the most requested items was “HIV-1 remission following CCR5Δ32/Δ32 haematopoietic stem-cell transplantation” (DOI: 10.1038/s41586-019-1027-4), which received 77 requests. The authors of the paper responded to and fulfilled each request, enabling the readers to obtain free access to the publication, and well ahead of Nature’s six-month embargo. However, since the accepted manuscript is now out of embargo, it has received a further 326 downloads to date in Apollo. The success of the Request a Copy service once again demonstrates the need for access to scholarly research at the earliest opportunity. Embargoes, even ‘short’ 6 month embargoes, are a needless barrier to the University’s research outputs. 

Data news 

Aside from the update to the Research Data Management Policy Framework (see above), the most significant development from 2019 has been the continued evolution of the Data Champion Programme

We welcomed 40 new Data Champions (DCs) from across several Schools increasing the size of our network to 86. With such a large cohort of Champions a new idea of creating departmental hubs was initiated to increase collaboration and the sharing of practices by Data Champions from the same areas. This has proved really successful in both Chemistry and Engineering, with a more coordinated approach having the effect of greater productivity from the Champions in those areas in engaging others with data management. 

In 2019, the Data Champions also tried out a mentoring scheme for the first time whereby established Champions support new Champions in finding their feet and give them ideas about how to provide support to their own community. This has proved to be a great success and the scheme is being run for a second year for the new cohort of Champions joining in early 2020. 

Finally, a new paper on the Data Champion community was published, Establishing, Developing and Sustaining a Community of Data Champions, by DC alumnus James Savage and our colleague Lauren Cadwallader in Data Science Journal. 

Thesis news 

The requirement to deposit an electronic copy of a PhD thesis in order to graduate has become normal business now. In 2019, 1197 of theses were deposited with 47% being made fully open access. In addition, around 100 requests to digitise historical theses were received from their authors and 1015 requests for scans of historical theses were received from requesters. 

Training 

In 2019 we took a broad perspective and examined how training was contributing to promoting and supporting Open Research at Cambridge. The Task Group on Open Research Training, comprised of representatives of several libraries and colleagues from other areas of the University, conducted a number projects to understand where we are at the moment and plan a strategy for the future. The details of that work will be presented at the RLUK 2020 conference in March but, as a ‘sneak peek’, here are some of the conclusions we drew: 

  • We’re stronger together: researchers will benefit if we build stronger communication between training providers. 
  • Open Research training should not be seen in isolation to the rest of research, rather it should be a key component of the way students learn to do research. 
  • Postdocs and senior researchers want to learn independently, we can support them with better-presented information online and by facilitating events and dialogue. 
  • We want to be able to constantly improve our training and demonstrate impact by exploring ways to evaluate ourselves, while also being aware of the lurking danger of irresponsible metrics in our own evaluation.  

Alongside the strategy work, we continued to expand the training we offer on Open Access, Research Data Management, publishing, copyright and more. A growing number of departments have requested sessions and we have partnered with PLOS and the Office for Postdoctoral Affairs to deliver a regular session on peer review. We delivered 56 sessions, reaching over 800 researchers and librarians. In addition, we have offered a session about complying with the REF Open Access requirements to departments; the Open Access team outdid themselves by delivering 20 sessions to individual departments in just over three months. 

Outreach activities 

In 2019 we hosted several events, from workshops to a one-day symposium dealing with open access monographs, FAIR data, preprints, reproducibility in social sciences, Plan S developments in the USA and open research in STEMM.  

Of notable interest is the Symposium on Open Monographs held in October at St Catharine’s College. This one-day event brought together researchers, funders, publishers and learned societies to discuss the benefits and challenges of an open landscape for academic books. The recordings are featured in the OSC YouTube channel and most of the presentations are available in our institutional repository, Apollo. A summary of the key themes that emerged from this symposium were later presented in Unlocking Research. 

October would not have been complete without celebrating Open Access Week. During the week we shared various blogs and online resources and we were delighted to announce the launch of our popular Research Support Ambassador Programme as an open educational resource designed to give learners either an introduction or refresher on key elements of research support. 

Systems 

Apollo has participated in a joint pilot study with Jisc, Symplectic and Sheffield Hallam University to look best approaches to integrate the Jisc Publications Router and the research information system Symplectic Elements, via institutional repositories. This pilot has involved working together to look at how well Elements could capture details of articles that Router had sent to our repositories. Router currently works with EPrints and DSpace repositories, the platforms used by Sheffield Hallam and Cambridge respectively. 

Symplectic’s Repository Tools 2 (RT2) integration module was used to harvest Apollo and de-duplicate them against any existing Elements records. We tested how well this worked for repository records deposited automatically by Router, looking in particular at the volume of duplicate publications and how early after acceptance notifications were received from Router. The study demonstrated that Router and Elements are technically compatible when used in this way. As a result of this pilot, Jisc and Symplectic are now happy to offer this solution to institutions more widely. 

Some excellent work behind the scenes has resulted in Jisc publishing a series of blogs last November. Their third blog showcases the ORCID IDs in Research Data Management workflows at the University of Cambridge and how a workflow has been implemented in order to create seamless links between researchers and their works using identifiers and different services. Such solutions improve visibility and discoverability across systems, reduce duplication of effort in entering information and avoid identification errors.

This work was made possible by Agustina Martínez García of the Office of Scholarly Communication, Owen Roberson of the Research Office, and Dean Johnson of University Information Services (UIS) who were amongst the winners of the professional services recognition scheme two years ago for their effective collaborative work on the integration of Symplectic Elements and Apollo. 

According to the blog, as of September 2019, 25,550 articles, 1,329 conference proceedings and 1,100 datasets in Apollo have ORCID IDs. 

Saying a big thank you 

2019 saw the departure of the University’s first Head of Scholarly Communication, Dr Danny Kingsley. Many of the achievements of 2019 were due to hard work Danny put in before her departure and for this we’d like to thank her for all she contributed. 

Published 26 February 

Compiled by: Maria Angelaki 

Image showing that this blog post is under CC-BY licence.

Contributions from Agustina Martínez-García, Bea Gini, Maria Angelaki, Lauren Cadwallader, Sacha Jones and Arthur Smith.

Embarking on a career in open access

Lorraine and Olivia started working as Scholarly Communication Support in the Open Access team at the Office of Scholarly Communication (OSC) in the University Library this summer. In this interview, they share their experience of starting a new role in the field of open access, from the perspective of their respective backgrounds in academia and publishing. 

What does working in Scholarly Communication Support entail and what are your responsibilities in this role? 

For the first few months joining the Open Access team we both started looking at “Fast Track deposits”, the simplest route of depositing author’s manuscripts into Apollo, the University of Cambridge institutional repository. This system allows the team to process items more quickly than the manual Apollo deposit. Since its launch in September 2018, it has considerably helped to reduce the workload as manuscript submission for archiving in Apollo continues to increase in view of the upcoming REF2021. On a daily basis, we also deal with queries from tickets created on the Open Access Helpdesk, contacting authors and publishers when further information is required and manually depositing manuscripts on Apollo while also updating their records on Symplectic Elements, the University’s research information management system.

Olivia and I are now being trained to respond to researchers’ funding queries and to process invoices for journals’ open access fees from the RCUK and COAF block grants. In order to do this we have had to learn more in depth about open access requirements and Research Councils’ funder requirements.

More recently, we have been working with Units of Assessment to support them with the open access component for REF (Research Excellence Framework) compliance, attending training sessions and reviewing Unit of Assessment outputs for eligibility. This has involved researching and interpreting the REF 2021 requirements for open access to disseminate effectively to academics and administrators. It has been illuminating to gain the perspective of different faculties, the way that they have to engage with REF, and their grapples with open access compliance. 

What are your respective backgrounds and how did you decide to start working in OA? 

Lorraine: Prior to working in open access, I completed a PhD in History of Art in Cambridge, looking at specific intersections between early modern artworks, medicine, and theories of the imagination. I also worked as a postdoctoral researcher at CRASSH (Centre for Research in the arts, social sciences and humanities) for one year. 

I first became interested in OA and Scholarly Communication during my studies as a PhD representative for my peers in History of Art between 2017 and2018, the year that electronic deposits of PhD theses via Apollo became a requirement. There were anxieties from my peers around this new requirement, especially in relation to the open access feature: what would this mean for publishing their first monographs from their PhD thesis as Early Career Researchers? Would publishers still be interested in their work after it had been made OA? And, especially, what about the hundreds of copyrighted images present in their theses? It would have taken months to obtain permission to reproduce all of those images. During this time, I liaised with the OSC, the head of the AHRC  Doctoral Training Partnership programme (as part of the RCUK, the AHRC also has its own open access requirements that apply to PhDs), communicated with faculty staff during meetings, and reported the advice I had gathered to my peers. I see this new position in the OSC Open Access team as an excellent opportunity to understand better what happens behind the scenes of an institutional repository and gain more knowledge about the broader picture of open access in academic research. 

Olivia: I left academic publishing with a sense that the model was broken. Expensive paywalls restrict access to those seeking to access information and academics were becoming increasingly disenchanted with the publishing model. These issues particularly hit home following two separate instances. The first, a letter sent to the publisher by a prisoner seeking further information on a criminology text, one which was prohibitively expensive and inaccessible to such an individual. The second, a cuttingly written forward by an academic around monograph publishing and the ivory towers in which university elites and academic publishers co-exist. 

Academic publishing very much feels like the other side of what I am doing with open access, making research as freely and widely available as possible. 

How do you think your past experiences have helped you to have the necessary skills for working in OA? 

Lorraine: As a Cambridge student, I acquired a good knowledge of Cambridge’s unique research and teaching landscape (Schools, Faculties, Departments, Colleges, Research centres, etc.). My academic background also meant that I had hands-on understanding about the process of research, publishing in a peer-reviewed journal, and even submitting my outputs through Symplectic Elements. These were really helpful starting my new role: understanding how researchers work is crucial in scholarly communication and definitely helps me to advise and communicate with researchers better. I am, for instance, particularly interested in the relationship between open access and third-party copyright (especially images from cultural heritage institutions, i.e. galleries, libraries, archives and museums) and the challenges it brings to researchers in the Arts and Humanities. 

Olivia: I have found my previous work in publishing an asset working in open access because of my knowledge of the editorial and production process as well as publishing revenue models. I am familiar with the time scales for journal articles and books production as well as publishers’copyright requirements which I have found I am using on a regular basis. Working extensively with academics in a production role, I am aware of the competing pressures placed on them and their need for clear and accessible information on fulfilling publishing commitments or REF compliance.

Now that you have started your new roles, what are the tips you would give to someone interested in starting a career in OA? 

Picking up from last year’s blogpost, and from our own experience: keeping up to date with developments, attention to detail, supporting academics and seeking support from the open access community are four key areas when starting in a career in OA.

Keeping up to date with developments and attention to detail

Publisher’s and funders’ open access policies change very quickly, as do the methods we adopt within the team to cope with the workflow and with the challenges brought by REF 2021. Anyone starting a career in OA needs to keep up to date with changes, be capable of doing in-depth research about those, and be comfortable admitting not knowing everything! The landscape is constantly changing and having an awareness of new proposals and initiatives makes the big picture much clearer. 

Supporting academics 

Give academics a break. It will take you a while to feel confident with policy and guidance and for you, it is your whole job. For the academics submitting their papers and contacting the repository, this is one small part of their role; you need to guide them through it as painlessly as possible. 

Seek support 

You cannot and do not know everything about open access. Luckily, there are plenty of wonderful expert colleagues who can help, so it is really important to know how to work within a team and keep building the necessary knowledge as a group. 

Published 25 October 2019

Written by Lorraine de la Verpilliere, Olivia Marsh

This icon displays that the content of this blog is licensed under CC BY 4.0

Image Copyright and Open Access in the Arts and Humanities

Copyright is a crucial topic in the Humanities because researchers in several disciplines (especially history of art, my field of study) rely on images for their work and because publishers usually require authors to pay copyright holders for permission to reproduce those images – failure to do so would make the author and the publisher liable for copyright infringement. 

At the OSC Symposium last 2nd October 2019 (Open Access Monographs: From Policy to Reality), Dr Nicola Kozicharow’s presentation on ‘Open Research Publishing in the Humanities’ made quite an impact on the discussions of the day. This early career historian of art, specialised in 19th– and 20th-century European and Russian art, talked about the challenge of publishing when third-party image copyright is involved. She detailed the difficult and sometimes grotesque situations that she and her contributor faced when publishing her first co-edited book Open Access, tracking down image copyright holders and paying exorbitant reproduction fees (1).

Not many academics outside the Arts and Humanities know about the invisible labour and material cost involved when working with images. Researchers struggle to find images on various heritage institutions’ websites (or GLAMs, as we call them – i.e. galleries, libraries, archives and museums), and pay to obtain digital images ‘for private use’ when the original work is unavailable or located too far. They often end up paying again in order to re-use those images when publishing their research. Even more frustrating is the lack of consistency between different institutions with regard to the amount of the fees and to the exemptions granted. If you beg the museum repeatedly and reach out to curators, you may have a small chance to have your permission fees waived (but still often in return for providing a free copy of your book/article). However, when sales department/companies act as intermediaries between researchers and museums, this kind of trick is most likely to fail, and the chances of opening the discussion about the absurdities of the fees get even slimmer. In 2018, Bridgeman Images, one of those ‘Image companies’, obtained the exclusive right of selling and licensing all images from Italian national museums, which was catastrophic news for art history (see their statement here).

The situation feels even more unacceptable when it concerns out-of-copyright works of art. In this case, heritage institutions in fact do not own copyright over the work as it has fallen in the public domain. Most GLAMs, however, manage to keep control of these works’ images by banning photography (the famous ‘no photo’ policies in permanent collections or temporary exhibitions) and by creating copyright by making their own photograph of the work that they subsequently sell to researchers. 

 An article by medieval art historian Kathryn M. Rudy published in Times Higher Education (also quoted by Kozicharow at the symposium) is a good example (2). There, Rudy detailed specific examples she encountered in her career and broke down the (shockingly high) real cost of working with images – she claims that the fees to publish images for her academic work since 2011 total £24,000 from her own pocket.* “The more successful I am, the poorer I get”, she says. The article went viral on academic Twitter networks and retweets and comments shine a light on the fact that many scholars face similar problems – one user ironically pointed out that it would be much cheaper to include with each book sold a packet of postcards from the museum than paying their prohibitive reproduction fees! (@winchester_books). 

This thorny issue of image copyright permissions in research publications is sadly not new. In the last couple of years, however, historians of art in the UK have succeeded to keep the issue at the front of the public debate. Back in 2017, an ‘End-fees-for-images’ campaign was started by Dr Bendor Grosvenor and Dr Richard Stephen. Along with 28 leading British art historians, they openly called for UK national museums to abolish image fees for out of copyright works of art in a letter published in The Times (3). Many other researchers in the field quickly added their names to this call through a petition on change.org. This campaign was supported in parallel by Grosvenor’s blog, Art History News – his strong presence in the media as a BBC4 presenter and on social media (@arthistorynews) also helped to promote the campaign.

This campaign revealed that there are in fact tools in the UK’s legal arsenal that art historians could use to limit fees. The 2015 Re-Use of Public Sector Information Regulations (RPSI), for instance, which “prevents publicly funded bodies from commercialising public assets” including publicly owned pieces of art. These regulations “do allow image fees to be charged, but only to cover the actual costs involved, and a very small ‘profit’”(4). They remain, however, very little used and barely known – both by researchers and museums. Interestingly, during the OSC’s Open Access Monographs symposium, it was also brought up that ‘fair dealing’ exceptions to copyright by way of quotation for the purpose of ‘criticism or review’ have not often been used by researchers and applied to visual material (5). Both RPSI and ‘fair dealing’ by quotation are in the end quite complex legal tools and, understandably, no art historians nor their publishers want to take the risk of a court case. We also have to take into account the wish of scholars to preserve good relationships with national heritage institutions in the UK – as images are their primary materials, their academic work depends on it entirely! 

During the Open Access Monograph symposium, the comment was made that this issue of high image reproduction fees as a barrier to Open Access publication was a misconception – that the real problem was instead about wider ‘digital’ and ‘online’ issues. However, the fact remains that permission fees are much higher if the image falls into the following categories (often used in image permission fees forms): ‘worldwide’, ‘online’, and ‘freely available’. How is this supposed to encourage researchers in the art and humanities to publish their research Open Access? We could, however, also frame the issue in a more positive way – what if Open Access itself could help humanities researchers deal with images better? Dr Kozicharow acknowledged the great support she received from Open Book Publishers (OBP) in allowing her to reproduce as many colour images she needed for her book. Kathryn M. Rudy, in her recent book also published with OBP, was able to display images in an innovative way (6). In order to contain costs, when images were already widely available, she instead added links on stable GLAMs websites – even QR codes in the case of the printed edition! Perhaps art historians should see open access publishing as a good opportunity to find innovative ways to think about solutions for images. Of course, there remains the problem of how Open Access is perceived in the Humanities, open access books not being sufficiently reviewed and often not deemed legitimate enough in the process of securing permanent positions and promotions – but this is a separate issue. 

What would be needed to help with image permission costs in art and humanities publishing?

In light of the growing requirements for open access publications, there should be better financial provisions to support researchers from universities and funding bodies. A recent report on Open Access from the Universities UK Open Access and Monographs Group, however, shows that there is a growing acknowledgement of the impossible situation faced by specific disciplines who rely on third-party material when publishing – such as history of art or archaeology. The UUK OA Monographs group notably recommended that “Given the already complex nature and expense of re-use clearance for illustrations and other third-party rights material in books, and the additional complexity and expense introduced by OA, an exception should be considered in any OA policy for books that require significant use of third-party rights materials” (7). 

Most of all, cultural heritage institutions have to do better. It does not seem unreasonable to be able to reproduce an image for free with the appropriate credit to the institution when a work of art is in the public domain. Some institutions worldwide have already started making their image collections open access or at least free of copyright fees for researcher’s publications. For example, Gallica, the Bibliothèque Nationale de France’s digital library, just changed its policy in favour of the latter. Positive changes such as these, that benefit the public and research, are being recorded and supported by the excellent Open GLAM initiative, funded by the European Commission. The new EU copyright directive (provided it can apply after Brexit?) should give the final push to get there, as it will allow free re-use of images of works of art in the public domain, even for commercial purposes.

Published 25 October 2019

Written by Dr Lorraine de la Verpillière 

This icon displays that the content of this blog is licensed under CC BY 4.0

*Correction: The  £24,000 figure in fact corresponds to fees Rudy paid to obtain the high-res image files for her academic work since 2011. The figure gets even higher when including the said images copyright fees – in the same article, she mentions for instance a £5,683 invoice from the Bodleian for the reproduction cost of her next book.

If you are a researcher at Cambridge University and need more information about third-party copyright, the following resources are for you:

 Libguides

Architecture & History of Art: Copyright and plagiarism

Copyright for Researchers 

Copyright helpdesk: email copyright-help@lib.cam.ac.uk 

Face-to-face training sessions [available to Cambridge University only]

Copyright: a survival guide (for PhD students in Humanities, Arts and Social Sciences) 

Do You Really Own Your Research? Copyright, Collaboration, and Creative Commons 

Your faculty or department may also run bespoke sessions, asking your librarian is the best way to find out.

References

(1) Louise Hardiman and Nicola Kozicharow, Modernism and the Spiritual in Russian Art: New Perspectives. Cambridge, UK: Open Book Publishers, 2017, https://doi.org/10.11647/OBP.0115

(2) Kathryn M. Rudy, ‘The true costs of research and publishing’, Times Higher Education, August 29 2019 (Url: https://www.timeshighereducation.com/features/true-costs-research-and-publishing#survey-answer)

(3) Matthew Moore, ‘Museum fees are killing art history, say academics’, The Times, November 6 2017 (Url: https://www.thetimes.co.uk/edition/news/museum-fees-are-killing-art-history-say-academics-qhfwmdws6 accessed: 10/10/2019)

(4) Bendor Grosvenor, ‘Why museums should abolish image fees (ctd.)’, Art History News blog, August 20 2018 (Url: https://www.arthistorynews.com/articles/5241_Why_museums_should_abolish_image_fees_(ctd.) accessed: 10/10/2019)

(5) Amendments to the The Copyright, Designs and Patents Act 1988 in the UK law since 2014, http://www.legislation.gov.uk/uksi/2014/2356/regulation/3/made

(6) Kathryn M. Rudy. Image, Knife, and Gluepot: Early Assemblage in Manuscript and Print. Cambridge, UK: Open Book Publishers, 2019, https://doi.org/10.11647/OBP.0145 

(7) Universities UK Open Access and Monographs Group, ‘Third-party rights’, in Open access and monographs evidence review, October 2019, p. 10-12 (PDF: https://www.universitiesuk.ac.uk/policy-and-analysis/reports/Documents/2019/UUK-Open-Access-Evidence-Review.pdf accessed 13/10/2019).

Chasing cash cows in a swamp? Perspectives on Plan S from Australia and the USA

Plan S was born in Europe, yet from the very start it aspired to accelerate conversations around open access on a global scale. After all, if free access to research outputs is good in one place, it will be good everywhere, right? Well, it turns out that things may not be that simple.

In this Open Access Week, we look East and West to find out how Plan S is being received across the globe. Dr Danny Kingsley explores how reliance on foreign students has trapped Australian universities in a ‘Faustian bargain’ with publishers and reduced the scope for change. Micah Vandegrift reports on the type of conversations that Plan S has inspired in the USA, as well as the potential political barriers, sounding a note of cautious optimism.

The uptake of Plan S or equivalent principles in countries beyond Europe is crucial to the overall success of the movement. Publishers are using the fact that uptake currently has limited geographic scope to stall change, arguing that they cannot alter their model to suit the requirements of a relatively small percentage of authors. The number of supporting funders is still small and concentrated in Europe, with a few US players. China initially looked set to join in and thus change the game, but since the end of 2018 we have seen little progress on that front. Has Plan S been successful in shaping conversations around the world?

Hearing from our colleagues in other countries highlights some of the promises and challenges Plan S is facing in making an impact outside Europe. Learning about those raises a number of interesting points for how we advocate for open access at home too.

Dr Danny Kingsley: Australia

Photo of Sydney Opera House over a calm sea.
Sydney Opera House. ‘ Plan S has not really caused much of a ripple Down Under ‘.

Rankings are a natural enemy of openness

When first approached by the Office of Scholarly Communication to write a piece about Plan S in Australia, my initial response was it would be very short. That is, Plan S has not really caused much of a ripple Down Under. Those in the know – people working in scholarly communication and some senior members of research institutions – are aware and watching closely. But as far as opening up a general discussion amongst the academic community, this simply hasn’t happened.

Over the past six months I have been trying to understand where some of the problems lie when it comes to openness in Australia. It is more fundamental than the usual concerns researchers have about Open Access, and goes to the heart of how universities work here.

Where the money flows

First a quick run-down on how research funding to universities works in Australia. There are only two government funders – the National Health and Medical Council (NHMRC) and the Australian Research Council (ARC). The amount of funding these granted in 2017-2018 was about $943 million and $758 million respectively to all research organisations. As a comparison, the Wellcome Trust endowed in the range of £10m – £50m in Australia in 2017-18. For those interested there is a full breakdown of sources of research funding.

The funder policies on Open Access and Research Data Management are pretty weak overall. The NHMRC policy requires that any peer reviewed publication be available in a repository 12 months after publication and “strongly encourages researchers to consider the reuse value of their data and to take reasonable steps to share research data and associated metadata arising from NHMRC supported research”. The ARC policy requires the metadata of research outputs to be available in a repository 3 months after publication and the work to be OA 12 months after publication. But the policy specifically states: “For the purposes of this policy, Research Outputs do not include research data and research data outputs.”

Resourcing limitations mean these policies are not monitored, and there are no sanctions for non-compliance. This means they are basically ineffective, given the findings of a study last year that identified what policies need to ensure compliance.

But these policies simply reflect a lack of policy generally in Australia, partly due to the revolving door that has been the Prime Ministership over the past five years. So, on face value, the reason for the lack of engagement with discussions around Plan S just reflect this lassitude.

But I am wondering if there might be something deeper at play here.

Cash cow

Australian universities are heavily financially reliant on overseas students, with the numbers of international students several multiples greater than any other comparable university worldwide. Numbers of overseas students have doubled since 2008, with 398,563 students enrolled in 2018. In one instance, the University of Sydney, fees from Chinese students make up one fifth of its annual revenue with $500 million in 2017. Taken across the country, these figures outweigh public research funding significantly.

While this dependence has been labelled as highly risky from a financial perspective, it is also causing serious issues elsewhere in the sector including concerns about eroding educational standards. But it is also causing a perversion in the way research is managed.

The role of the ranking

University rankings are extremely important in the recruitment of overseas students. The vast majority of Australian university websites list some interpretation of their rankings. Monash University and the University of Western Australia both note they are in the “top 100 universities in the world”. Other universities are more specific, naming their place, like UNSW at 43rd in the world and University of Queensland listing no fewer than five rankings, trumped by Queensland University of Technology with six rankings listed.

Chasing rankings comes at a price. In some instances, increasing a University’s position in the rankings is a specific strategy, with the University of Canberra a recent success story.

There is incredible pressure on researchers in Australia to perform. This can take the form of reward, with many universities offering financial incentives for publication in ‘top’ journals. This is fairly widespread, with some universities having this position on the public record. For example, Griffith University’s Research and Innovation Plan 2017-2020 includes: “Maintain a Nature and Science publication incentive scheme”. Publication in these two journals comprises 20% of the score in the Academic Ranking of World Universities.

Other institutions take a more draconian position. Murdoch University’s proposed ‘academic career framework’ identifies specific numbers of articles researchers are expected to publish in top journals per year. Not surprisingly this approach has been highly criticised for its “extremely narrow view of academic career success”.

Australia’s Chief Scientist has recently been arguing the need for a different way of assessing our researchers, with concern that the current system is fuelling bad science. With exception of some groundswell activity, this is as close as anyone is getting to using the ‘reproducibility’ word here in Australia, possibly from nervousness in the sector from government interference in the allocation of research grants in 2018. There is certainly nothing comparable to the UK or the US on this issue.

The Open Access challenge

But what has all of this to do with Open Access or Plan S? Well, everything actually.

For a start, signing up to the Declaration of Research Assessment (DORA), or the Leiden Manifesto is one of the principles of Plan S, with the Wellcome Trust stating that it will not fund research at institutions that have not signed up. Only a handful of Australian research organisations have signed DORA, none of which are universities. Given many Australian institutions are not only judging researchers on their publication record, but in some cases proscribing which journals in which they are allowed to publish, it would be extremely difficult for these institutions to become a signatory to DORA or the Leiden Manifesto.

But the main problem for the open agenda is the total reliance on specific metrics that deliver ranking numbers – metrics which enfold Australian universities into a Faustian bargain with the large commercial publishers.

Australian universities are not engaging with Plan S because they cannot afford to. And while the Australian funders remain silent on the topic (literally – a search for Plan S on each website comes up empty), there is little incentive to worry about it.

If anything, this situation further underlines the need to shift the academic reward system away from the single measure of publication of novel results in high impact journals.  Given how deeply ingrained that measure is in Australia it will be interesting to see where we are at this time next year.

Micah Vandegrift: USA

An image of a river in the USA.
A meandering river in the USA. Plan S has sparked conversations in the USA, but progress is slow.

A shot heard around the world

A little more than a year ago, open access had its “shot heard around the world” moment. Plan S expanded out from Europe, encompassing angst and excitement, requiring think-pieces from thought leaders, policy briefs from the wonks, and general malaise from lots of stakeholders. The European open agenda is, by design or by accident, shaping the horizon and Plan S continues to be a marker of that progression. I had the unique opportunity to be on the ground in Europe for most of the fallout last fall, and now with the benefit of time and geographic remove, I am observing the after effects, especially in how U.S.-based research communities are responding in kind. 

Ripples and tides

The greatest surprise is that Plan S seems to be the thing that is getting people from all corners out to debate the issues. The tidal wave of Plan S seems to have crashed on our shores with something for everyone – publishers, libraries, researchers, and funders. Librarianship tends to pivot around shifts in the publishing landscape, finding crevices to leverage our expertise and chances to show off that knowledge to researchers, and I expected Plan S to offer that as well. The weird thing, though, is that the responses have been uneven, distributed, and displaced. For example, I was invited along with Rick Anderson of Scholarly Kitchen fame to debate the Plan in front of 200+ managing and technical editors as the plenary session at their conference. On the flipside, Dr. Kelvin Droegemeier, announced as Director of the White House Office of Science and Technology Policy in January 2019 (after a vacancy since something happened in November 2016), flippantly addressed Plan S in an interview simply saying “we won’t ever tell people where to publish.” Bizarrely, a research policy affecting labs and scholars from Norway to Portugal is giving me a chance to meet and chat with publisher colleagues more than ever before, and not opened any new doors for communicating finer points of licensing with faculty on my campus. 

A slow-flowing river

Following the current into the near future, I believe that there are three tributaries that will come together. Funders will continue to exert their influence, supplanting publishers as drivers of the conversation, disciplines will adapt discipline-specific means of scholarly sharing (see the rise of pre-prints [PDF]), and policy makers will attempt to legislate cautious action toward a global research marketplace. However… in the U.S. context there are two barriers that could dam the flow. Uncertainty in our political climate, and an America-first foreign policy agenda, is boiling up concern about “undue foreign influence,” and I fear that isolationism will compel a counter narrative to the open and public sharing of research worldwide. Secondly, America is a god-damn huge country and developing a coherent national framework for openness seems to be a fool’s errand. However, what sometimes appears to be a bog can actually be a river barely inching along. If Plan S was a splash, Plan Open U.S. will be a steady drip, creating geologic formations of systemic change toward a more open research ecosystem. 

Conclusions

We read Danny and Micah’s contributions with great interest. They raised several questions about Plan S, which we hope to discuss with Micah after today’s talk.

  1. What can we do to increase engagement of our local academic communities with the open access agenda?
  2. Is it possible to uncouple decisions about research practice from financial or political/ideological considerations?
  3. How can government funders find a balance between dictating open research mandates and respecting the academic freedom of researchers?
  4. Can institutions measure research accurately without creating perverse incentives?
  5. Is there any country in the world where the mention of politicians does not trigger an immediate eye-roll?

Published 24 October 2019

Written by Dr Danny Kingsley (Scholarly Communication Consultant) and Micah Vandegrift (Open Knowledge Librarian at NC State University Libraries).

Compiled by Dr Beatrice Gini

This icon displays that the content of this blog is licensed under CC BY 4.0