Tag Archives: research data management

Open Research in Cambridge: 2022 in review

2022 has been another fantastic year for Open Research in Cambridge and I’m so proud of what we have achieved together as a community of researchers, library staff, technicians, administrators, publishers and more. I’d like to highlight some of the key themes in our work this year and thank all who have contributed to this work in any way throughout the year (though I have limited myself to naming chairs of workstrands below). The following video by our Pro-Vice-Chancellor (Research), Prof Anne Ferguson-Smith, gives an indication of the importance that the university places on this work.

Understanding disciplinary differences

I know that I’m not alone in hearing that researchers in Arts, Humanities & Social Sciences disciplines often feel a disconnect between the language and priorities of “Open Science” and their experiences of how research is conducted – this is one of the reasons we choose to frame it as “Open Research” here in Cambridge. I see a strong desire from many to engage with open research practices, paired with frustration with the challenges of translating the terminology of open science to other areas. In order to better understand these issues, we established two working groups (Open Research in the Humanities and Open Qualitative Research), each of which was tasked with forgetting what they think they should do due to how open science is generally described, and instead describe what they see as the opportunities for open research within their disciplines.

The Open Research in the Humanities group was chaired by Prof Emma Gilby and supported by Dr Matthias Ammon. Their excellent report is already available on Apollo and through a series of blog posts here on Unlocking Research. The Open Qualitative Research group was chaired by Dr Meg Westbury and their report is due to come to the university’s Open Research Steering Committee in January. We will be sharing this more widely in early 2023 – it’s well worth watching out for! Both reports will inform how we talk about Open Research at Cambridge and will shape the transformative programme that we are in the process of developing.

Research data management

Our small but dedicated Research Data team, led by Dr Sacha Jones, has had another impressive year. Our Data Champions Network goes from strength to strength, and has expanded into departments that have not been represented in previous years. Other key projects have included a review of our research data services with recommendations for future development, a project on electronic research notebooks, and lots of work to support open research system developments, all while continuing to support researchers with data deposits and writing data management plans. This team is expanding next year which will enable even more work to meet the needs of different disciplines.

The future of scholarly publishing

We hosted a series of three strategic workshops on the future of scholarly communication earlier this year, developed in collaboration between Cambridge University Libraries and Cambridge University Press. Led by independent facilitator Mark Allin, participants across disciplines and career stages came together to discuss the problems of scholarly communication, potential long-term solutions to these problems and a strategy to help Cambridge get us there. The proposals emerging from the meeting are currently being developed and include newly developed infrastructures for diamond open access publishing projects and a series of high-level strategic meetings aimed at strategic improvements to equity in academic publishing. There are already diamond publishing initiatives within Cambridge, and projects will start in early 2023 to understand existing initiatives in greater detail and to provide the infrastructure to establish additional diamond journals. 

The library’s annual Open Research Conference took a similar visionary approach in its focus on the future of open access. Titled Open Access: Where Next?, the conference featured expert speakers on how we can think beyond open access toward more innovative, sustainable and equitable open futures. We heard from researchers excluded by certain approaches to open access, how other researchers are addressing issues through their own scholar-led approaches, alongside how openness fits into changing research cultures and can facilitate experimental publishing projects. A full round up with videos of each session is available on the Unlocking Research blog.  My thanks to Dr Bea Gini for her leadership in planning this conference.

Open Access now

While we are actively working towards a new future for scholarly publishing, we also need to ensure that our researchers have ways to make their work open access right now. We do this in a number of ways, engaging with the academic community and contributing expert open access advice on publishing agreements that are negotiating across the sector and administering the block grants that are provided by funders and the university to cover the costs of publishing in fully open access venues. All of this requires close reading and interpretation of funder requirements to ensure that we are able to support our researchers in what they are required to as well as what they would like to do. I’d like to specifically thank Alexia Sutton, who leads our Open Access team, and Dr Samuel Moore, our Scholarly Communication Specialist, for their leadership in this area.

We are particularly pleased with the engagement from across the university with the ongoing Rights Retention Pilot, which provides a route to open access for articles that cannot be made immediately available through existing publishing deals, are not eligible for the block grants mentioned above or where the publisher simply does not provide any route to immediate open access. We are now consulting on the development of a Self-Archiving Policy which is buit on what we have learned throught he pilot and will sit within our Open Access Publications Policy Framework. Members of the university can find out more by reading this document (accessible to Raven users only). It has been an honour to lead a dedicated group of library and research staff on this project.

Open research systems

Everything we do requires that we have the right technical infrastructure in place. The Open Research Systems team is led by Dr Agustina Martinez-Garcia and based within Cambridge University Libraries’ Digital Initiatives directorate. This year has seen projects to upgrade links between Symplectic Elements and Apollo, technical changes to support the rights retention pilot, a review of the open research systems landscape, contributing to thinking around future publishing platforms, electronic research notebooks and data infrastructure, and planning ahead for the upgrade to DSpace 7, improvements in the thesis service, and building connections between DSpace repositories and Octopus. This is not a comprehensive list and we plan to showcase more of their work on the blog in 2023.

Research enquiries, briefings and training

I want to end with huge thanks to the library staff based both in the Office of Scholarly Communication and in the Faculty & Department Libraries who do so much throughout the year, answering frontline research support queries, signposting as required, providing tailored briefings and training on highly complex and constantly changing topics. We especially value the disciplinary insights we get through working closely with the Research Support Librarians that are based within the Schools.

Join our team!

Open Research is an incredibly rewarding area to work in and the scale of what we’re trying to achieve is really ambitious. I’m delighted that the importance of what we are doing is recognised by both Cambridge University Libraries and the wider university and as a result we are expanding our team!

We are currently recruiting for an Open Research Community Manager to establish and develop a Cambridge Open Research Community, bringing researchers across the university community together through regular online and in person events to enable exchange of expertise in open and rigorous research practices. In January, we plan to advertise for two Research Data Coordinators and an Open Research Administrator, with a Research Services Manager post following later in the year. All of these roles will be listed on the university’s jobs site as well as on LinkedIn, mailing lists etc. If you’re interested in our work and would like to find out more about these opportunities please get in touch at info@osc.cam.ac.uk!

Open Research in the Humanities: CORE Data

Authors: Emma Gilby, Matthias Ammon, Rachel Leow and Sam Moore

This is the third of a series of blog posts, presenting the reflections of the Working Group on Open Research in the Humanities. Read the opening post at this link. The working group aimed to reframe open research in a way that was more meaningful to humanities disciplines, and their work will inform the University of Cambridge approach to open research. This post reflects on the concept of FAIR data and proposes an alternative way of thinking about data in the humanities.

As a rule, data in the arts and humanities is collected, organised, recontextualised and explained. We are therefore putting forward this acronym as an alternative to LERU’s FAIR data (findable, accessible, interoperable, reusable). Our data is collected rather than generated; organised and recontextualised in order to further a cultural conversation about discoveries, methods and debates; and explained as part of the analytical process. Any view of scholarly comms as uniquely about the distribution of and access to FAIR data (‘from my bench to yours’) will seem less relevant to A&H academics. Similarly, the goal of reproducibility of data – in the sense in which this often appears in the sciences and social sciences, where it refers to the results of a study being perfectly replicable when the study is repeated – is, if anything, contrary to the aim of CORE data: i.e. the aim that this data should be built upon and thereby modified through the process of further recontextualization. Our CORE data, then, understood as information used for reference and analysis, is made up of texts, music, pictures, fabrics, objects, installations, performances, etc. Sometimes, this information does not belong to us, but is owned by another person or institution or community, in which case it is not ours to make public.


The A&H tend to bring information together in new ways to further discussion about socio-cultural developments across the globe. Available digital data is only the tip of the iceberg when it comes to the material that is worked with.[1] Arts and humanities scholars, who spend their lives thinking about the arrangement and communication of information, are acutely aware that archives (digital and otherwise) are not neutral spaces, but man-made and the product of human choices. This means that information available online, to a broadband-enabled public, is asymmetrical and distorted.

One of the main benefits of open research is that it is thought to make data globally accessible, especially to ‘the global south’ and to institutions with fewer available funds to ‘buy data in’. As we explore below (‘research integrity’), this unidirectional view of open access is problematic. In general, digital material tends to reproduce English-speaking structures and epistemologies. As FAIR data is redefined as CORE data, an attention to context will hopefully promote the diverse positions occupied by all those who make up the world and who produce research about it.

Support required

In order usefully to employ CORE data in the A&H, we need to bring to the surface and examine underlying assumptions about knowledge creation as well as knowledge dissemination.

The work of the digital humanities – rooted explicitly in digital technologies and the forms of communication that they enable – is obviously a vital part of these discussions about opening up the CORE data of the humanities. Digital work, in the same way as any other successful A&H research, needs to consider its own materiality and conditions of production, evaluate its own history, draw attention to its own limits, and navigate its trans-temporal relationships with data in other forms (the manuscript, the printed text, the painting, the piece of music). This is a developing field and one that still has an uneasy relationship with the existing tenure/promotions system.[2] Colleagues noted that training needs are evolving constantly. It is often hard to know where to turn for specific guidance in e.g. how to manage one’s own ‘born digital’ archives, how to deconstruct a twitter archive, and so on.

This issue also overlaps with the need, as part of the ‘rewards and incentives’ process outlined below, to evaluate the success of colleagues as they undertake this training and negotiate with these processes. DH is one of the most exciting and rapidly developing areas of research and needs to be widely resourced. But it would also be harmful to collapse all A&H research into ‘the digital humanities’. The work of colleagues whose CORE data is resistant, for whatever reason, to wide online dissemination in English also needs to be allocated the value it deserves: some publics are simply smaller than others.

Postscript: the group subsequently became aware of the CARE Principles of Indigenous Data Governance. These principles will also be considered when developing our services in support of data management and ethical sharing.

[1] Erzsébet Tóth-Czifra, ‘The Risk of Losing the Thick Description: Data Management Challenges Faced by the Arts and Humanities in the Evolving FAIR Data Ecosystem’, in Digital Technologies and the Practices of Humanities Research, edited by Jennifer Edmond (Open Book Publishers, 2014), https://doi.org/10.11647/OBP.0192.10

[2]See the excellent article by Cait Coker and Kate Ozment ‘Building the Women in Book History Bibliography, or Digital Enumerative Bibliography as Preservation of Feminist Labor’, Digital Humanities Quarterly 13 (3), 2019, http://www.digitalhumanities.org/dhq/vol/13/3/000428/000428.html – where the authors of the ‘Women in Book History’ digital bibliography still see the tenure system as ‘monograph-driven’, and had to fund their research through selling merchandise.

Cambridge Data Week 2020 day 2: Who is reusing data? Successes and future trends?

Cambridge Data Week 2020 was an event run by the Office of Scholarly Communication at Cambridge University Libraries from 23–27 November 2020. In a series of talks, panel discussions and interactive Q&A sessions, researchers, funders, publishers and other stakeholders explored and debated different approaches to research data management. This blog is part of a series summarising each event.  

The rest of the blogs comprising this series are as follows:
Cambridge Data Week day 1 blog
Cambridge Data Week day 3 blog
Cambridge Data Week day 4 blog
Cambridge Data Week day 5 blog


Reuse of data is the final element of the FAIR principles and has long been argued as a central benefit of data sharing, allowing others access to a wealth of research and making research funding more efficient by removing the need to duplicate work. Yet we are still in the process of evaluating success in this area. This webinar brought together speakers to discuss what we know about the current state of play around data reuse, what researchers can do to increase the reuse potential of their data, and possible future developments in data reuse.

Our speakers – Louise Corti (UK Data Archive) and Tiberius Ignat (Scientific Knowledge Services) – looked at data reuse from two different perspectives. Louise focused on the reuse of UK Data Service collections, sharing some examples of their most widely used data sets, discussing what makes them popular and sharing some principles that can be used both to make data more reusable and to promote it for reuse. Tiberius discussed the prevalence of data reuse by machines and the possibility of granting machines data reuse rights.

Louise’s presentation gave an overview of the portfolio of data sets hosted by the UK Data Service, looked at their top 20 most downloaded datasets and discussed the underlying principles that have led to them being widely reused. As well as demonstrating some commonalities between these datasets, Louise also outlined the principles used by the UK Data Service to promote their collections for reuse.

Tiberius’ presentation looked at data reuse from a different perspective, serving as a call to action to share research data responsibly and protect it against the reuse of machines designed to persuade humans. One of Tiberius’ main arguments was that no research data from public projects should be made available to feed and develop persuasive algorithms.

The presentations motivated an interesting discussion covering a broad range of topics. These included the reuse of qualitative data, how we can implement ethical safeguards data reuse, the idea of data ethics as a continuum, whether we can accept positive cases of algorithmic persuasion such as to promote equality and diversity, and the possibility of creating specific licences prohibiting data reuse by persuasive algorithms. See below for a video and transcript of the session.

Audience composition

We had 341 registrations with just over 65% originating from the Higher Education sector. Researchers and PhD students accounted for nearly 37% of the registrations whilst research support staff accounted for an additional 33%. We also had registrations from at least 30 countries outside of the UK including significant attendance from Denmark, Holland, Germany and Canada. We were thrilled to see that on the actual day 187 people attended the webinar.

We held five online webinars during Cambridge Data Week and were pleased to see that nearly 25% of the participants attended more than one webinar. A total of 1364 people registered and more than 700 attended all together, with the rest possibly watching the recordings at a later date. Most of all we were pleased to welcome participants from all over the world and see how important research data management topics are globally.

Where data was available, we identified the following countries apart from the UK:  Australia, Austria, Bangladesh, Brazil, Canada, Colombia, Croatia, Czech Republic, Denmark, France, Germany, Greece, Holland, Hungary, Iran, Luxembourg, Moldova, Norway, Poland, Romania, Singapore, Spain, Sweden, Switzerland, Turkey, Ukraine and the USA.

Recording , transcript and presentations

The video recording of the webinar can be found below and the recording, transcript and presentations are present in Apollo, the University of Cambridge repository

Bonus material

After the session ended, we continued the discussion with Louise and Tiberius looking in particular at one question posed by an audience member:

AI can always be used either for good or bad. Instead of locking-in, how can we enhance technology through data and regulation? 

Tiberius Ignat I think at this point we need regulation. I’m not a big fan of using regulations, to be honest. I think it’s much better to motivate people but, in this case, it’s quite a bit of control that has been lost, so I think we should have a regulation on how research data can be reused by others. This is how the internet has been made profitable during the last decade — through non-human persuasion. All these companies that are giving so much away for free are making billions of dollars when you look at the stock market. We were not clear how they were making this profit until recently when we realised that they are doing it by changing our behaviour and I think the rest of society – including research organisations – are behind them, so we need some regulation.

A good example is with GDPR. It has been introduced to protect our data, our digital footprint. On ResearchGate or Eurosport, or any other website, we used to be asked to agree to cookies or not. Recently, a new option called “Legitimate interest” has been slipped in and our digital data is again collected – less noticeably – by invoking questionable legitimate rights. The organisations whose model is based on persuading need cookie data, so they have moved the discussion away from remaining GDPR compliant to defending their legitimate interests. They are fighting to take data away from us. We can tackle this with regulation faster but in the long term we need to educate people to be more aware. We do have licenses such as Creative Commons but I’m not sure we have the right ones to protect us.

Louise Corti There are a variety of licenses, but they are abused and it’s very hard to track along the way what has gone wrong. I quite like the UK Government’s approach with some of their statistical data that has to go through a legal gateway. Some data can be made available for research, but it has to be done for the public good. We also have the Ethics Self-Assessment Tool, which is a grid you go through provided by the Statistics Authority and it asks you to think along lots of different dimensions of ethics. This helps researchers get a better sense of what they are trying to do, but whether the people we are talking about would care about it is a very different matter. Having been in research ethics for a very long time, that is by far the best tool I’ve seen and I recommend everyone uses it. The UK Data Archive uses it to evaluate some of the projects we deal with because you find often university ethics approvals are not good enough for the Statistics Authority because often they don’t understand quantitative secondary analysis, so the ethics scrutiny is not good enough. Self-Assessment is a much more nuanced thinking about the different dimensions of ethics and it helps researchers to be a bit more reflective about what’s good and what’s not.


Overall, the session provided a compelling blend of both the practical and conceptual elements of data reuse, each raising questions which could have easily been entire sessions in themselves. Louise’s presentation gave an excellent overview of the UK Data Service’s approach to making their datasets more reusable and promoting them to maximise their chances of being reused. Tiberius’ session raised some interesting questions surrounding data reuse and the ethics of using algorithms to persuade humans, as well as looking at some practical options for protecting research data from reuse for nefarious ends. At the end of the session, the audience were asked to participate in a poll on “What future developments are needed to increase the prevalence of data reuse?”.

Audience responses to poll held at the end of the event

The results were unsurprising to either speaker, with each touching on the idea that a change in research culture is necessary to ensure data reuse projects are seen as equal to data-generating projects. The need for cultural change is a theme that ran throughout each of the sessions in Data Week and is perhaps one of the current major challenges in scholarly communication.


Data Access and Research Transparency (DA-RT): A Joint Statement by Political Science Journal Editors

Robots appear more persuasive when pretending to be human

Behavioural evidence for a transparency–efficiency tradeoff in human–machine cooperation

The next-generation bots interfering with the US election

IBM’s AI Machine Makes A Convincing Case That It’s Mastering The Human Art Of Persuasion

AI Learns the Art of Debate


Published on 25 January 2021

Written by Dominic Dixon

CCBY icon