Tag Archives: open data

Open Research at Cambridge Conference – Opening session

The Open Research at Cambridge conference took place between 22–26 November 2021. In a series of talks, panel discussions and interactive Q&A sessions, researchers, publishers, and other stakeholders explored how Cambridge can make the most of the opportunities offered by open research. This blog is part of a series summarising each event. 

The opening session, chaired by Dr Jessica Gardner (University Librarian and Director of Library Services) included talks by Professor Anne Ferguson-Smith (Pro-Vice-Chancellor for Research), Professor Steve Russell (Acting Head of Department of Genetics and Chair of Open Research Steering Committee), Mandy Hill (Managing Director of Academic Publishing at Cambridge University Press) and Dr Neal Spencer (Deputy Director for Collections and Research at the Fitzwilliam Museum). All four speakers foresee an increasingly open future, with benefits for both institutions and researchers. They also considered some of the challenges that still need to be worked through to avoid potential problems.

What is working well?

In recent years, we have made great progress in the proportion of publications that are open access. Over three quarters of publications with Cambridge authors last year were openly available in some form.

The trend is continuing and it is not unique to our institution. CUP have set an ambitious goal for the vast majority of research articles they publish to be open access by 2025.

Other forms of publication are becoming common, meeting different dissemination needs. Preprints have been the star of the show during the pandemic, allowing rapid dissemination while formal peer review follows down the line.

Diagram from Mandy Hill’s slide: ‘Increasingly open platforms and formal publishing will meet different dissemination needs’

In the scholarly communication arena, open access articles benefit from more downloads and citations. Museum-based projects involving artisans, schools and artists all found enthusiastic responses.

What can we look forward to?

Research culture is coming under the spotlight across the sector, and Cambridge has committed to an ambitious action plan to create a thriving environment to do research. Key principles include openness, collaboration, inclusivity, and fair recognition of all contributions.

Diagram from Prof Steve Russell: ‘Going Forward’

Implementing the San Francisco Declaration on Research Assessment (DORA) is part of this progress. We want to assess research on its own merits rather than on the basis of journal or publisher metrics. This also means recognising all research outputs and a broad range of impacts.

Reproducibility is increasingly recognised as critical in a number of disciplines. A developing UKRN group within the University aims to ‘take nobody’s word for it’ – but rather support reproducible workflows that underpin confidence in the conclusions of research. By sharing and rewarding best practice we can become world leaders in this area, and in open research more widely.

In the past, museum collections have tended to be documented in limited ways, with poor accessibility and interoperability, which made it hard to discover and use materials. Several exciting projects at the Fitzwilliam Museum and more broadly have started to change that. There are opportunities for a single discovery portal, tying together different collections. The Fitzwilliam Museum is also making its collection discovery process richer, by providing opportunities for deeper dives, and more connected, by linking with other collections and resources.

Deep zoom access to an image in the Fitzwilliam collection. Adapted from Dr Neal Spencer’s slide ‘Fitzwilliam Museum Collections Search’.

What problems should we be mindful of?

There are still barriers that hinder some open research aspirations. Historical constraints on the ways we find materials, conduct research, and publish results remain. Some systems may need to be reimagined, while not scrapping structures that are still serving us well.

Cambridge is a large and complex institution, where change takes time. Nevertheless, there is an established governance structures and an evolving set of policies that support open research.

Most importantly, researchers should be at the centre of the move towards open research. It is important that they benefit from open practices, rather than finding themselves torn between competing priorities. Conversations continued throughout the week to explore possible approaches in different disciplines, drawing from the rich diversity of experiences to shape the future of open research at Cambridge.

Open Data Sharing and reuse

The Open Research at Cambridge conference took place between 22–26 November 2021. In a series of talks, panel discussions and interactive Q&A sessions, researchers, publishers, and other stakeholders explored how Cambridge can make the most of the opportunities offered by open research. This blog is part of a series summarising each event. 

The session described here was on ‘Open data sharing and reuse’ and is summarised by the session chairs, Dominic Dixon (Research Librarian) and Dr Sacha Jones (Research Data Manager) at the Office of Scholarly Communication, Cambridge University Libraries.

The recording of the event can be found here:

Have you wondered how research data is used after it has been shared publicly as open data? What are some of the impacts of sharing data and of its subsequent reuse by others? Are there ethical factors to consider? Does the researcher or research group who shared their data openly benefit in any way from its reuse? What are the essential properties of a reusable dataset? This session on ‘Open data sharing and reuse’ explored these questions and more via presentations delivered by a panel of University of Cambridge researchers from various fields. They included: Professor Richard (Rik) Henson, Deputy Director of the MRC Cognition and Brain Sciences Unit, Professor of Cognitive Neuroscience at the Department of Psychiatry and President of the British Neuroscience Association; Professor John Suckling, Director of Research in Psychiatric Neuroimaging in the Department of Psychiatry and chair of the University of Cambridge Research Ethics Committee; Dr Mihály Fazekas, Assistant Professor at the Department of Public Policy, Central European University, and scientific director of an innovative think-tank at the Government Transparency Institute; and Professor Simon Deakin, Professor of Law in the Faculty of Law and Director of the Centre for Business Research.

All speakers discussed challenges and concerns around data sharing, including how and when to share. Rik asks, “Why wait until publication?” to share research data, and perhaps consider publishing a data paper where a dataset is celebrated in its own right, without the narrative of a traditional article. Researchers are often concerned about scooping but there’s little evidence of this and it may be a “paper tiger”. There’s an additional fear that data sharing will expose errors in work but as Rik noted, “I think we just need to get over our egos and accept that everyone makes errors”. One particular challenge can be to control what people (or bots) do with your data, but researchers have a choice over where to share (e.g., which repository to choose) and how to license their data. Something that was implicit in all talks, and stated explicitly by Simon, is that the benefits of sharing data openly vastly outweigh the costs.

Sharing data deriving from research involving human participants is understandably complex due to data protection regulations (e.g., GDPR), obtaining informed consent, and the challenge of anonymising datasets, particularly those containing qualitative data. Participants need to be informed about how their data will be used, so the message is that data sharing needs to be planned far in advance, even at the gestation of the project idea. It is important to be aware of the repository options; for example, if managed/controlled access to data is required then hear about the set-up at MRC CBU discussed by Rik, or the UK Data Service for sensitive qualitative data, as highlighted by Simon. John discusses the import and export of datasets from an ethical perspective, giving two examples from the biomedical and social sciences with a focus on secondary data use. He says that these examples illustrate just how far in the future you might need to think when considering how your data might be reused by others: it is “a lot better to ask for permission from all the stakeholders in these studies than it is to ask for their forgiveness”.

Data must be shared well for both researchers and society to reap the benefits. To do this, select an appropriate repository, adhere to any ethical/legal requirements, follow discipline-specific standards and make your data FAIR (Findable, Accessible, Interoperable, Reusable). A key element of the latter is data documentation, an issue raised repeatedly during this session. Sharing the data alongside any associated code and detailed information about the data will enable it to be reused effectively and mitigate against misuse. Mihály discusses sharing the Digiwhist project data, which has been reused by academia, policy, civil society and the media, and emphasises this: “Every time I put out bits and pieces of my data and code that was not clear, I just kept on receiving the same question over and over again. So actually, it’s in your own best interests to document your work fully because then it is a lot more efficient for you”. Providing data about data is part of being completely transparent about the research process and results, enabling others to understand exactly what was done and to build on it. In some fields, this is an essential part of research reproducibility and replicability. As another example, Simon describes sharing the CBR Leximetric datasets – currently, the 2nd most downloaded dataset in Apollo and 8th of all UK institutional repositories – where not only the data were shared but also the methodology and an extensive codebook.

In both examples, being transparent in this way has led to wider reuse of these data and many citations of the data and associated publications. The benefits of FAIR data sharing and data reuse certainly do not rest solely in the number of resulting citations. Ethical and transparent research leads to credible research and researchers, enhancing reputations and quality of outputs. These are elements that all speakers highlighted in their talks. To end on a quote from Simon about the outcome of sharing data and of its subsequent reuse: “It’s been a very very positive experience for us”.  

We’re always happy to receive any questions or comments you may have about data sharing and reuse. You can contact us at info@data.cam.ac.uk and see our Research Data website for more information.

Additional resources

University of Cambridge School of Clinical Medicine guidance on secondary data use and related ethical considerations, discussed by Professor John Suckling.

The Digiwhist project website discussed by Dr Mihály Fazekas. The Digiwhist project is also one of the University’s research projects highlighted on the University of Cambridge global impact map.

Video of a previous talk by Professor Simon Deakin for OpenConCam 2016 talk on ‘Open Access and Knowledge Production 0 “Leximetric” Data Coding’.

The FAIR principles are outlined by Wilkinson et al. (2016) in Scientific Data – “The FAIR Guiding Principles for scientific data management and stewardship”. There is also a useful guide for researchers on how to make your data FAIR.

Visit the University of Cambridge Research Data website for information on research data management, data sharing and guidance on depositing data into Apollo, the institutional repository. The site also hosts the University of Cambridge Research Data Management Policy framework, which is relevant to all research staff and students.

Cambridge Data Week 2020 day 1: Who are the winners and losers of good data practices?

Cambridge Data Week 2020 was an event run by the Office of Scholarly Communication at Cambridge University Libraries from 23–27 November 2020. In a series of talks, panel discussions and interactive Q&A sessions, researchers, funders, publishers and other stakeholders explored and debated different approaches to research data management. This blog is part of a series summarising each event.  

The rest of the blogs comprising this series are as follows:
Cambridge Data Week day 2 blog
Cambridge Data Week day 3 blog
Cambridge Data Week day 4 blog
Cambridge Data Week day 5 blog

Introduction

The first day of Cambridge Data Week 2020 kicked off with a tantalisingly open question: who are the winners and losers of good data practices? This question was addressed via two different perspectives: those of a funder, provided by Dr Georgie Humphreys (Wellcome), and of a publisher, provided by Dr Catriona MacCallum (Hindawi). Discussion of this topic during presentations and the Q&A session looked through various (but not mutually exclusive) lenses, including those of data sharing, quality, ethics, and research culture. Funder mandates for data sharing and what these have achieved (e.g. saving research funds related to data reuse) were reflected upon, as were disciplinary differences between STEMM, social sciences, arts and humanities. There was also a discussion of evidence relating to shifts in research culture and if this is pointing to better data practices. As a whole, the webinar explored a broader view of good data practices, the consequences of these, and the progress being made in embedding good data management in research. 

Topical for this year, both speakers discussed data sharing related to Covid-19 research. Catriona stated that Covid has exposed systemic flaws in the existing system (in relation to data sharing), and Georgie highlighted some surprising results regarding data availability statements in Covid-related articles. The CARE Principles for Indigenous Data Governance were also bought to the fore by Catriona, who argued for attention to be placed on potential power issues surrounding data sharing. These are a set of principles, complementary to the FAIR principles, but which encourage the open research movement to fully engage with Indigenous Peoples rights and interests. A pervasive undercurrent ran throughout the webinar – research culture and some problems therein. These were addressed explicitly by both speakers, with both stating that more needs to be done by institutions to implement DORA and reward researchers for their achievements and good research practices and not just according to where (i.e. in what journals) their research is published. Catriona highlighted results from a 2019 EUA report that shows that institutions have some way to go in this regard, that the value of data is not fully recognised, and that responsible research assessment is at the heart of cultural change in the right direction.

We had some great questions from the audience that were answered in the Q&A session, such as “In countries without the REF, is data sharing better?”, and “How do you get qualitative researchers on board with this?”, and “What is the role of universities in the so-called data-driven economy?”. Our audience also responded to the poll we held at the end of the webinar, where we asked participants to select one from seven given options that they regard as most likely to prevent good data practices among researchers. Resource indicators (knowledge, time, money for RDM) amounted to 46% of responses (blue in the chart below) and cultural indicators amounted to 53% (orange in the chart). Overall, the results were rather surprising but optimistic, revealing that a dominant perception among the participants is that a shift in cultural practices is one of the leading factors necessary to drive forward good data practices in research.

Graph showing the results of the poll held during the webinar, indicating what participants consider most likely to prevent or inhibit good data practices.
Figure 1. Results of the poll held during the webinar, where participants were asked to choose one of seven factors that they consider most likely to prevent or inhibit good data practices.

Audience composition

We had 274 registrations for this webinar, with just over 70% originating from the Higher Education sector. Researchers and PhD students accounted for 40% of registrations and research support staff for an additional 30%. On the day, we were thrilled to see that 164 people attended the webinar, participating from a wide range of countries.

Recording , transcript and presentations

The video recording of the webinar can be found below and the recording, transcript and presentations are present in Apollo, the University of Cambridge repository.

Bonus material

There were a few questions we did not have time to address during the live session, so we put them to the speakers afterwards. Here are their answers:

What are the ethics of using secondary data, particularly in relation to primary versus secondary researchers’ objectives, meaning of data/methods, consent of participants, and in the case of qualitative data, the personal relationships built between researcher and participants?

Georgie Humphreys This question seems to allude to informed consent which is still a topic of active discussion in terms of what one tries to build into the original informed consent to allow subsequent secondary use down the line. There is this idea of broad consent now where a participant would consent to that particular project but they’re also consenting to their data being kept and maybe reused for other purposes related to different scientific questions, but maybe with clauses such as ‘not for commercial benefits’. There are potential concerns about re-identification but there are mechanisms for dealing with that – mechanisms which reduce risk whilst retaining value, such as anonymisation or synthetic data creation. But there are other datasets where that’s just not going to be possible, where you lose all value of the original dataset. The UKDS have a nice page on informed consent, providing information on what you put in your consent forms to enable secondary use. This needs to be thought about at the very start of the study prior to collection of the primary data.

Catriona MacCallum This question is really focusing on data privacy issues. The primary researcher collects the data, the secondary researcher reuses the data. There are ways that researchers can be given access to the data while maintaining privacy. The primary researcher is creating the relationships with participants in order to obtain data, so what does this mean ethically for those wishing to reuse the data? Safety nets do need to be put into place. Here, it’s important to raise the CARE principles again. These were the result of a working group that came about as a result of concerns about how data from indigenous people are being treated. The slogan is now ‘Be FAIR and CARE’. The CARE principles are emerging in the UN’s agenda, and UNESCO, and I’m sure it will come up with the Research Council’s too.

What are the best practices to ensure data quality? 

Catriona MacCallum It depends what is meant by ‘quality’ as there are various ways of looking at this. The European Commission came up with the economic loss of not publishing failed experiments; in other words, the publication bias that results. We need to redefine what we mean by quality, integrity and again this speaks to the research culture as no one gets rewarded for publishing a failed result and in fact the researchers end up feeling embarrassed and tend not to do it. Publication bias is huge! It also applies to the humanities and social sciences as well but potentially in a different way, and there are huge biases in terms of what gets published and what is allowed to get published.

Georgie Humphreys This issue is probably a plug for the open peer review model where the filter is not at the beginning but later on. [In open peer review, authors and reviewers are aware of each other’s identity and encouraged to engage in open discussion. This makes the process more transparent, removing bias or conflicts of interest. Manuscripts are made publicly available pre-review, and reviews are published alongside the article].

Conclusion

So, who are the winners and losers of good data practices? Georgie believes that everyone, in the long term, will be a winner. If time is spent ensuring data is well-documented, well-organised, has dictionaries, is stored somewhere for the long term, then it will benefit the data creators just as much as anyone else. In the short term, she acknowledges that there may be people that find being a champion in this field a challenge for them individually, but it’s just about continuing along this journey to get to the point where everything is in place to truly reward and recognise those that have good open practices and good data management practices. Catriona says that there are so many winners: the economy, society, and science, the social sciences and humanities – all will benefit from data sharing. Taking society as an example, sharing data and sharing it well (through good research data management) will increase public trust in science, benefit public health and even help toward achieving multiple sustainable development goals.

Resources

A Covid-19 press release by Wellcome in January 2020 called on researchers, publishers and funders to share or facilitate the sharing of interim and final data as rapidly as possible. Wellcome have been exploring the impact of this statement on data sharing.

‘The FAIR Guiding Principles for Scientific Data Management and Stewardship’ by Wilkinson et al. in Scientific Data (March 2016).

CARE Principles of Indigenous Data Governance. The full CARE principles are outlined here.

UKDS information on informed consent, including a downloadable model consent form with suggested wording to allow secondary data reuse.

An April 2020 publication by Colavizza et al. on ‘The citation advantage of linking publications to research data’ showing that article citations are greater when they have data availability statements that include a link (e.g. DOI) to data archived in a repository.

A European University Association (EUA) report published in October 2019 by Saenen et al. on ‘Research assessment in the transition to Open Science: 2019 EUA Open Science and Access Survey Results’.

Published 25 January 2021

Written by Dr Sacha Jones with contributions from Dr Georgie Humphreys, Dr Catriona MacCallum and Maria Angelaki.  

CCBY icon