Tag Archives: Open Research

Data Diversity Podcast (#4) – Dr Stefania Merlo (2/2)

April 25, 2025Uncategorizeddata champions, open data, Open Research, policy, research data managementLutfi Bin Othman

We return with another post featuring our Data Diversity conversation with University of Cambridge Data Champion, archaeologist Dr Stefania Merlo from the McDonald Institute of Archaeological Research, the Remote Sensing Digital Data Coordinator and project manager of the Mapping Africa’s Endangered Archaeological Sites and Monuments (MAEASaM) project and coordinator of the Metsemegologolo project. This post is short in word count but not in importance, as it touches on two reflections on the challenges of data management as a researcher who works in a global context, two aspects of present-day academia that may be relevant to many readers. This edition follows on from the previous post where Stefania talks about the challenges of extending UK-based Open Data policies to non-UK communities that may not share the same enthusiasm for making their cultural heritage artefacts available Open Access.

In this post, Stefania reflects on how she conducts herself as a European researcher working in the African continent where her intention may sometimes be misaligned with the local data co-creators. Stefania also shares the challenge of academic mobility, where migrating from one academic institution to another results in data that is left behind, provoking an uncomfortable thought: what would happen to your data when you are suddenly rendered uncontactable?

One would like to think that this is a rare situation, but I suspect that the situation where somebody passes away unexpectedly or even not, or somebody retires and has not made a plan for what happens to an entire careers’ data set happens more often than we know. I think it is an individual’s responsibility to make plans, but I think support should be given by the institutions and people should be accompanied through this path. – Dr Stefania Merlo

Working in the African continent and being honest about the objectives of research

Working in Africa and in African countries, gives somebody coming from a European background, and an Italian background like me, a particular set of challenges and opportunities, because you encounter a different set up with everything – with life, and with research. Living and working in this context in various African countries, allows a researcher coming from a different background to question and challenge themselves on how they do their work. Many things that are taken for granted in other settings cannot be taken for granted in that setting. In particular that relationship with the land, with nature, and with the past. Any archaeologist that works in this setting would tell you that there are certain things that you just know from very early on that you should do. For example, although we’re dealing with the past of archaeological landscapes, you don’t just go and do your work there without acknowledging that these landscapes come in spaces and areas occupied by people today, and that those people are the custodians of the land and of the archaeology today. So there needs to be a deep engagement with communities and with people even before you put your spade in the ground. And it takes time to build relationships of trust, and relationships that then allow you to do work on your own or together, depending on what the aim of your research is.

When I do work that fulfills certain academic goals that may not be of interest to the communities that I work with, I think it is better to be honest and tell them that I’m doing this piece of work because there is an archaeological question that probably only archaeologists are interested in, and this is the part of work that I’m doing. At the same time, I think it is also important then to acknowledge that you work in a setting that includes other people, and start thinking about what work you can do with the people that are custodians of or inhabit a particular part of the world. Then you start thinking, OK, there’s a different set of activities that I can do with people that people want to do with me and let’s do that. I think that it is important to have this honesty of saying that particular things are of interest to me and to my academic community that I would like to do, and then we can negotiate together. You have to engage with the community, and I think we should be a bit more honest and a bit more specific about what the expectations from both parties are, and from the setting, we’re coming and the setting we’re going to.

There are certain academic activities that I’m expected to do that are of no interest whatsoever for the communities that I’m working with, such as the academic publications on which my career rests. Then, there are other things that the communities are interested in that will give me no weight whatsoever in my academic career but contribute to building a relationship with the local community. These give me so much fulfillment because I realise that I am doing research work that is useful not only for my academic community, but for other people, be it students, colleagues elsewhere in the world, or the building of policies around archaeological heritage.

Global researcher, global data

LO: As someone who has engaged in research all over the globe, how do you deal with data that is in various places around the world?

SM: How do I deal with my data? – poorly. I may be a digital data champion, but it has been a difficult road, and it is still a difficult road, that of even managing and curating my own data. Just to give you an example, a lot of the data I’ve collected for the past 20 years is both in analog and digital format for the same project. I have some data with me here (in Cambridge) and I still have data backed up in hard drives that I haven’t opened in a long time. The majority of my analogue data sets, maps, drawings, diaries, I have left behind in South Africa when I moved here, and I haven’t been able to bring them with me. Some of my materials are in Italy with my family. Some of my diaries I had left back in Cambridge when I left to go to Botswana in 2006 and somehow got lost. So, it has been messy and I’m not proud of it. But I’m saying it because it is a problem with a lot of researchers that have become highly mobile and have migrated from one place to another, in some cases without sufficient funding to bring all of the paperwork with them. I have been a messy data collector, since my undergraduate and PhD days, and I’ve been trying to train myself to be better, I’m still not there yet, and in part it’s just me. But I think it has also to do with this very high mobility and having to change institutions in my career so many times. And what changed is not only the location, but the requirement of what you do with data where you put it, how you avail it to yourself and to others.

And so yes, I’m not very good at it but I’m trying very hard to find a way of now putting everything together because I do feel the responsibility that comes with collecting data in different countries. Some of it is actually information that was given to me from community members or friends, or colleagues that I work with and it’s with me.
It’s their work, it’s with me and if anything ever happens to me – if I were to change institutions, or if anything were to happen to me, including losing my memory – let me put it like that – what’s going to happen? I’ve never really thought of what would happen if I were to move or to shift? I left my previous institution quite abruptly and during COVID, and I was able to take some materials out, but some other materials I didn’t get access to and they are still all over the place.

And then I started thinking: I have never made a plan for this kind of situation to happen. So what am I going to do now in order to make sure that these data are usable and useful for me, but perhaps also to others when I’m not present as the curator that will be able to tell you what each data asset is. I’m not even talking about the creation of metadata. Most of my photographs, digital photographs, for example, have got metadata that have been ordered. But archaeological datasets are complex, fragmented and can be dispersed so the main challenge is how would you connect the photographs with the drawings within my diary? Of course, there are dates, but it’s going take so much time for somebody else to put all of it together, especially because half of it is in digital format and half of this is in analog format. That is going be a nightmare and may not even be doable. And so, I’ve become acutely aware of the fact that we never think of this situation. We rarely think about handing over data to others in a particular form that will allow others accessibility and ability to still reuse this complex interrelated data if they were to do so.

Worst case (data) scenario

I have another example. One of my collaborators and mentors in South Africa passed away quite suddenly a couple of years ago. They had never made a plan for what would happen to their materials. They published prolifically, so we know a lot of the research that was done over 50 years, but I am aware that they had so much more material, both physical material and files in computers. Their physical collection was transferred from their house to the University by another colleague but, to the best of my knowledge, to date, no one has been able to get access to the digital data, stored in a password protected computer. One would like to think that this is a rare situation, but I suspect that the situation where somebody passes away unexpectedly or even not, or somebody retires and has not made a plan for what happens to an entire career’s data set happens more often than we know. I think it is an individual’s responsibility to make plans, but I think support should be given by the institutions and people should be accompanied through this path. In particular, perhaps academics from other generations that may not be so knowledgeable about how to deal with data management. In particular of digital data, but also of analog data.

Once upon a time, archaeologists used to just put everything into a library or an archive so at least we have the analog records. But again, putting them together and having them make sense is extremely difficult if we don’t think of a framework for doing so. Another issue that I’ve mentioned before is mobility. You know, how do we assist researchers that have got high mobility to deal with this every time they move? I don’t have an exact formula, but when I changed institutions before, both the institution that I was leaving and the ones that were accepting me, I was never asked ‘do you need any financial or other kind of help to transfer your data?’ I was asked to fill in forms for transferring my goods, I was given money for my visa, but nobody ever asked about my academic research and the related data.

We once again thank Stefania for taking the time to speak to us and giving us food for thought. Stefania raises, we believe, a very important question – are we taking for granted that we will always be at hand to ensure that the data that we produce will be understood? Researchers tend to wait until a project is completed before supplying their data with the information needed to make them understood and reusable. If there’s one thing that Stefania brings to mind, is that data FAIR-ness needs to be implemented from the onset of a project and then at every juncture of the project’s lifecycle, as the research unfolds. That way, the research data will be reusable in a self-contained manner.

The Research Data Sustainability Workshop – November 2024

March 20, 2025Uncategorizedopen data, Open Research, research data management, sustainabilityLutfi Bin Othman

The rapid advance of computing and data centres means there is an increasing amount of generated and stored research data worldwide, leading to an emerging awareness that this may have an impact on the environment. Wellcome have recently published their Environmental sustainability policy, which stipulates that any Wellcome funded research projects must be conducted in an environmentally sustainable way. Cancer Research UK have also updated their environmental sustainability in research policy and it is anticipated that more funders will begin to adopt similar policies in the near future.

In November we held our first Research Data Sustainability Workshop in collaboration with Cambridge University Press & Assessment (CUP&A). The aim was to address some of the areas common to researchers with a focus on how research data can impact the environment. The workshop was attended by Cambridge Data Champions and other interested researchers at the University of Cambridge. This blog summarises some of the presentations and group activities that took place at the workshop to help us to better understand the impact that processing and storing data can have on the environment and identify what steps researchers could take in their day-to-day research to help minimise their impact.

The Invisible Cost to Storing Data

Our first speaker at the workshop was Dr Loïc Lannelongue, Research Associate at the University of Cambridge. Loïc leads on the Green Algorithms initiative which aims to promote more environmentally sustainable computational science and has developed an online calculator to check computational carbon footprint. Loïc suggested that the aim is not that we shouldn’t have data, as we all use computing, just that we should be more aware of the work we do and the impact it has on the environment so we can make informed choices. Loïc emphasised that computing is not free, even though it might look like that to the end user. There is an invisible cost to storing data, whilst the exact costs are largely unknown, the estimates calculated for data centres suggest that they emit around 126mt of CO₂ e/year. Loïc furthered explained that there are many more aspects to the footprint than just greenhouse gas emissions such as water use, toxicity, land use, minerals, metals and human toxicity. For example, there is a huge amount of water consumption needed to cool data centres, and you often find that cheaper data centres tend to use larger amounts of water.

Loïc continued to discuss how there are a wide range of carbon footprints in research with some datasets having a large footprint. The estimate for storing data is ~10kg CO₂ per tb per year, although there are many varying factors that could affect this figure. Loïc pointed out that the bottom line is – don’t store useless data! He suggested we shouldn’t stop doing research, we just have to do it better. Incentivising and addressing sustainability in Data Management Plans from the outset of projects could help. Artificial Intelligence (AI) is predicted to help to combat the impact on the environment in the future, although as AI comes at a large environmental cost, whether any benefit will outweigh the impact is still unknown. Loic has written a paper on the Ten simple rules to make your computing more sustainable, and he recommends looking at the Green DiSC Certification which is a free, open-access roadmap for anyone working in research (dry lab) to learn how to be more sustainable.

The Shift to Digital Publishing

Next to present was Andri Johnston, Digital Sustainability Lead at CUP&A. Andri discussed how her role was newly created to address the carbon footprint within the digital publishing environment at CUP&A. In publishing, there has been a shift from print to digital, but after publishing digitally, what can be done to make it more sustainable? CUP&A are committed to being carbon zero by 2048, aiming for a 72% reduction by 2030. As 43% of all their digital emissions for the wider technology sector come from digital products such as software, CUP&A have been looking at how they can create their digital products more sustainably. They have been investigating methods to calculate digital emissions by looking at their hardware and cloud hosting, which is mostly Amazon Web Services (AWS) but they use some Cambridge data centres. Andri explained how it has been hard to find information on AWS data centres emissions and knowing whether your users use a fixed line or cellular internet network connection (some cellular network towers use backup diesel generators which have a higher environmental impact) is hard to pinpoint. AWS doesn’t supply accurate information on the emissions of using their services and Andri is fully aware that they are using data to get data!

Andri introduced the DIMPACT project (digital impact), where they are using the DIMPACT tool to report and better understand the carbon emissions of platforms serving digital media products. Carbon emissions of the academic publishing websites at CUP&A have reduced in the last year as the team looked at where they can make improvements. At CUP&A, they want to publish more and allow more to access their content globally, but this needs to be done in a sustainable way to not increase the websites’ carbon emissions. The page weight of web pages is also something to consider; heavy web pages due to media such as videos can be difficult to download for people in areas with low bandwidth so this needs to be taken into account when designing them. The Sustainable web design guide for 2023 has been produced with Wholegrain Digital, and can be downloaded for free. Andri mentioned that in the future they need to be aware of the impact of AI as it is becoming a significant part of publishing and academia and will increase energy consumption.

Andri concluded by summarising that in academic publishing, they will always be adding more content such as videos and articles for download. It is likely that researchers may need to report on the carbon impact of research in the future, but the question on how best to do this is still to be decided. The impact of downloaded papers is also a question that the industry is struggling with, for example how many of these papers are read and stored.

Digital Preservation: Promising for the Environment and Scholarship 

Alicia Wise who is Executive Director at CLOCKSS gave us an overview of the infrastructure in place to preserve scholarship for the long-term. This is vital to be able to reliably refer to research from the past. Alicia explained that there is an awareness to consider sustainability during preservation. When print publishing was the main research output, preservation was largely taken care of by librarians, in a digital world this is now undertaken by digital archives such as CLOCKSS. The plan is to prepare to archive research for future generations 200-500 years from now!

CLOCKSS was founded in the 1990’s to solve the problem of digital preservation. There is a now a growing collection of digitally archived books, articles, data, software and code. CLOCKSS consists of 12 mirror repository sites located across the world, all of which contain the same copies. The 12 sites are in constant communication, using network self-healing to restore the original if a problem is detected. CLOCKSS currently store 57.5 million journal articles and 530,500 books.

CLOCKSS are a dark archive, this means they don’t provide access unless it is needed, such as when a publisher goes out of business, or a repository goes down. If this happens, the lost material is made open access. CLOCKSS have been working with the DIMPACT project to map and calculate their carbon footprint. They have looked at the servers at all their 12 repository sites to estimate the environmental impact. It became clear that not all their sites are equal. The best was their site at Stanford University, where the majority of the CLOCKSS machines are located. Stanford has a high renewable energy profile, largely due to their climate and even have their own a solar power plant! They also have a renewable, recirculating, chilled underground water system for cooling the servers. The site at Indiana University was their worst performing as this is supplied by 70% coal. The estimated carbon emissions at the Indiana University site is estimated to be 9 tonnes of carbon per month (equivalent to a fleet of 20 petrol cars).

Alicia explained that most of the carbon emissions come from the integrity checking (self-healing network). CLOCKSS mission is to reduce the emissions, and they are looking into whether reducing the number of repository sites to 6 copies could still predict preservation will be available in 500 years’ time. They are reviewing what they need to keep and informing publishers of their contribution so they can consider this impact.

Alicia summarised by saying that it appears that digital preservation may have a lower carbon footprint than print preservation. CLOCKSS are working with the Digital Preservation Coalition to help other digital archives reduce their footprint too (with DIMPACT), they are finalising a general tool for calculation of emissions that can be used by other archives. They don’t want to discourage long-term preservation, as currently, 25% of academic journals are not preserved anywhere. This risks access to scholarship in the future. They want to encourage preservation, but in an environmentally friendly way. 

Preserving for the future at the University of Cambridge

There are many factors that could impact data remaining accessible now and over time. Digital Preservation maintains the integrity of digital files and ensures ongoing access to content for as long as necessary. Caylin Smith, Head of Digital Preservation at Cambridge University Libraries, gave an overview of the CUL Digital Preservation Programme that is changing how the Libraries manages its digital collection materials to ensure they can be accessed for teaching, learning, and research. These include the University’s research outputs in a wide range of content types and formats; born-digital special collections, including archives; and digitised versions of print and physical collection items.

Preserving and providing access to data, as well as using cloud services and storing multiple copies of files and metadata, all impact the environment. Monitoring usage of cloud services and appraising the content are two ways of working towards more responsible Digital Preservation. Within the Programme, the team is delivering a Workbench, which is a web user interface for curatorial staff to undertake collection management activities, including appraising files and metadata deposited to archives. This work will help confirm that any deposited files, whether these are removed from a storage carrier or securely uploaded, must be preserved long term. Curatorial staff will also be alerted to any potential duplicate files, export metadata for creating archival records, and create an audit trail of appraisal activities before the files are moved to the preservation workflow and storage.

Within the University Library, where the Digital Preservation team is based, there may be additional carbon emissions from computers kept on overnight to run workflows and e-waste (some of the devices that become obsolete may still have a use for reading data from older carriers e.g. floppy disk drives). Caylin explained that CUL pays for the cloud services and storage used by the Digital Preservation infrastructure, which means you can scale up and scale down as needed. They are considering whether there is a need for an offline backup and weighing up if the benefit to having such a backup would outweigh costs and energy consumption.  

Caylin discussed what they and other researchers could do to reduce the impact on the environment: use tools available to estimate personal carbon footprint and associated costs of research; minimise access to data where necessary to minimise use of computing. Ideally data centres and cloud computing suppliers should have green credentials so researchers can make informed choices. There is a choice to make between using second hand equipment and repair equipment where possible. At Cambridge we have the Research Cold Store which is low energy as it uses tapes and robots to store dark data, but the question remains as to whether this is really more energy efficient in the long term. 

What could help reduce the impact of research data on the environment?

The afternoon session at the workshop involved group work to discuss two extreme hypothetical mandated scenarios for research data preservation. It allowed the pros and cons of each scenario to be addressed, how this could impact sustainability and problems that could arise. We will use the information gathered in this group session to consider what is possible right now to help researchers at the University of Cambridge make informed choices for research data sustainability. Some of the suggestions that could reduce research data storage (and carbon footprint) include improving documentation and metadata of files, regularly appraising files as part of weekly tasks and making data open to prevent duplication of research. It could also be helpful to address environmental sustainability at the start of projects such as in a Data Management Plan.

We have learned in this workshop, that research data can have an environmental impact and as computing capabilities expand, this impact is likely to increase in the future. There are now tools available to help estimate research carbon footprints. We also need stakeholders (e.g. publishers, funders) to work together to advocate that relevant companies provide transparent information so researchers can make informed choices on managing their research data more sustainably.

Data Diversity Podcast (#4) – Dr Stefania Merlo (1/2)

March 19, 2025Uncategorizeddata champions, funders, open data, Open Research, policy, research data managementLutfi Bin Othman

Welcome back to the fourth instalment of Data Diversity, the podcast where we speak to Cambridge University Data Champions about their relationship with research data and highlight their unique data experiences and idiosyncrasies in their journeys as a researcher. In this edition, we speak to Data Champion Dr Stefania Merlo from the McDonald Institute of Archaeological Research, the Remote Sensing Digital Data Coordinator and project manager of the Mapping Africa’s Endangered Archaeological Sites and Monuments (MAEASaM) project and coordinator of the Metsemegologolo project. This is the first of a two-part series and in this first post, Stefania shares with us her experiences of working with research data and outputs that are part of heritage collections, and how her thoughts about research data and the role of the academic researcher have changed throughout her projects. She also shares her thoughts about what funders can do to ensure that research participants, and the data that they provide to researchers, can speak for themselves.

This is the first of a two-part series and in this first post, Stefania shares with us her experiences of working with research data and outputs that are part of heritage collections, and how her thoughts about research data and the role of the academic researcher have changed throughout her projects. She also shares her thoughts about what funders can do to ensure that research participants, and the data that they provide to researchers, can speak for themselves.

I’ve been thinking for a while about the etymology of the word data. Datum in Latin means ‘given’. Whereas when we are collecting data, we always say we’re “taking measurements”. Upon reflection, it has made me come to a realisation that we should approach data more as something that is given to us and we hold responsibility for, and something that is not ours, both in terms of ownership, but also because data can speak for itself and tell a story without our intervention – Dr Stefania Merlo

Data stories (whose story is it, anyway?)

LO: How do you use data to tell the story that you want to tell? To put it another way, as an archaeologist, what is the story you want to tell and how do you use data to tell that story?

SM: I am currently working on two quite different projects. One is Mapping Africa’s Endangered Archaeological Sites and Monuments (funded by Arcadia) which is funded to create an Open Access database of information on endangered archaeological sites and monuments in Africa. In the project, we define “endangered” very broadly because ultimately, all sites are endangered. We’re doing this with a number of collaborators and the objective is to create a database that is mainly going to be used by national authorities for heritage management. There’s a little bit less storytelling there, but it has more to do with intellectual property: who are the custodians of the sites and the custodians of the data? A lot of questions are asked about Open Access, which is something that the funders of the projects have requested, but something that our stakeholders have got a lot of issues with. The issues surround where the digital data will be stored because currently, it is stored in Cambridge temporarily. Ideally all our stakeholders would like to see it stored in a server in the African continent at the least, if not actually in their own country. There are a lot of questions around this.

The other project stems out of the work I’ve been doing in Southern Africa for almost the past 20 years, and is about asking how do you articulate knowledge of the African past that is not represented in history textbooks? This is a history that is rarely taught at university and is rarely discussed. How do you avail knowledge to publics that are not academic publics? That’s where the idea of creating a multimedia archive and a platform where digital representations of archaeological, archival, historical, and ethnographic data could be used to put together stories that are not the mainstream stories. It is a work in progress. The datasets that we deal with are very diverse because it is required to tell a history in a place and in periods for which we don’t have written sources.

It’s so mesmerizing and so different from what we do in contexts where history is written. It gives us the opportunity to put together so many diverse types of sources. From oral histories to missionary accounts with all the issues around colonial reports and representations of others as they were perceived at the time, putting together information on the past environment combining archaeological data. We have a collective of colleagues that work in universities and museums. Each performs different bits and pieces of research, and we are trying to see how we would put together these types of data sets. How much do we curate them to avail them to other audiences? We’ve used the concept of data curation very heavily, and we use it purposefully because there is an impression of the objectivity of data, and we know, especially as social scientists, that this just doesn’t exist.

I’ve been thinking for a while about the etymology of the word data. Datum in Latin means ‘given’. Whereas when we are collecting data, we always say we’re taking measurements. Upon reflection, it has made me come to a realisation that we should approach data more as something that is given to us and we hold responsibility for, and something that is not ours, both in terms of ownership, but also because data can speak for itself and tell a story without our intervention. That’s the kind of thinking surrounding data that we’ve been going through with the project. If data are given, our work is an act of restitution, and we should also acknowledge that we are curating it. We are picking and choosing what we’re putting together and in which format and framework. We are intervening a lot in the way these different records are represented so that they can be used by others to tell stories that are perhaps of more relevance to us.

So there’s a lot of work in this project that we’re doing about representation. We are explaining – not justifying but explaining – the choices that we have made in putting together information that we think could be useful to re-create histories and tell stories. The project will benefit us because we are telling our own stories using digital storytelling, and in particular story mapping, but it could become useful for others as resources that can be used to tell their own stories. It’s still a work in progress because we also work in low resourced environments. The way in which people can access digital repositories and then use online resources is very different in Botswana and in South Africa, which are the two countries where I mainly work with in this project. We also dedicate time into thinking how useful the digital platform will be for the audiences that we would like to get an engagement from.

The intended output is an archive that can be used in a digital storytelling platform. We have tried to narrow down our target audience to secondary school and early university students of history (and archaeology). We hope that the platform will eventually be used more widely, but we realised that we had to identify an audience to be able to prepare the materials. We have also realised that we need to give guidance on how to use such a platform so in the past year, we have worked with museums and learnt from museum education departments about using the museum as a space for teaching and learning, where some of these materials could become useful. Teachers and museum practitioners don’t have a lot of time to create their own teaching and learning materials, so we’re trying to create a way of engaging with practitioners and teachers in a way that doesn’t overburden them. For these reasons, there is more intervention that needs to come from our side into pre-packaging some of these curations, but we’re trying to do it in collaboration with them so that it’s not something that is solely produced by us academics. We want this to be something that is negotiated. As archaeologists and historians, we have an expertise on a particular part of African history that the communities that live in that space may not know about and cannot know because they were never told. They may have learned about the history of these spaces from their families and their communities, but they have learned only certain parts of the history of that land, whereas we can go much deeper into the past. So, the question becomes, how do you fill the gaps of knowledge, without imposing your own worldview? It needs to be negotiated but it’s a very difficult process to establish. There is a lot of trial and error, and we still don’t have an answer.

Negotiating communities and funders

LO: Have you ever had to navigate funders’ policies and stakeholder demands?

SM: These kinds of projects need to be long and they need continuous funding, but they have outputs that are not always necessarily valued by funding bodies. This brings to the fore what funding bodies are interested in – is it solely data production, as it is called, and then the writing up of certain academic content? Or can we start to acknowledge that there are other ways of creating and sharing knowledge? As we know, there has been a drive, especially with UK funding bodies, to acknowledge that there are different ways in which information and knowledge is produced and shared. There are alternative ways of knowledge production from artistic ones to creative ones and everything in between, but it’s still so difficult to account for the types of knowledge production that these projects may have. When I’m reporting on projects, I still find it cumbersome and difficult to represent these types of knowledge production. There’s so much more that you need to do to justify the output of alternative knowledge compared to traditional outputs. I think there needs to be change to make it easier for researchers that produce alternative forms of knowledge to justify it rather than more difficult than the mainstream.

One thing I would say is there’s a lot that we’ve learned with the (Mapping Africa’s Endangered Archaeological Sites and Monuments) project because there we engage directly with the custodians of the site and of the analog data. When they realise that the funders of the project expect to have this data openly accessible, then the questions come and the pushback comes, and it’s a pushback on a variety of different levels. The consequence is that basically we still haven’t been able to finalise our agreements with the custodians of the data. They trust us, so they have informed us that in the interim we can have the data as a project, but we haven’t been able to come to an agreement on what is going to happen to the data at the end of the project. In fact, the agreement at the moment is the data are not going to be going on a completely Open Access sphere. The negotiation now is about what they would be willing to make public, and what advantages they would have as a custodian of the data to make part, or all, of these data public.

This has created a disjuncture between what the funders thought they were doing. I’m sure they thought they were doing good by mandating that the data needs to be Open Access, but perhaps they didn’t consider that in other parts of the world, Open Access may not be desirable, or wanted, or acceptable, for a variety of very valid reasons. It’s a node that we still haven’t resolved and it makes me wonder: when funders are asking for Open Access, have they really thought about work outside of UK contexts with communities outside of the UK context? Have they considered these communities’ rights to data and their right to say, “we don’t want our data to be shared”? There’s a lot of work that has happened in North America in particular, because indigenous communities are the ones that put forward the concept of C.A.R.E., but in UK we are still very much discussing F.A.I.R. and not C.A.R.E.. I think the funders may have started thinking about it, but we’re not quite there. There is still this impression that Open Data and Open Access is a universal good without having considered that this may not be the case. It puts researchers that don’t work in UK or the Global North in an awkward position. This is definitely something that we are still grappling with very heavily. My hope is that this work is going to help highlight that when it comes to Open Access, there are no universals. We should revisit these policies in light of the fact that we are interacting with communities globally, not only those in some countries of the world. Who is Open Access for? Who does it benefit? Who wants it and who doesn’t want it, and for what reasons? These are questions that we need to keep asking ourselves.

LO: Have you been in a position where you had to push back on funders or Open Access requirements before?

Not necessarily a pushback, but our funders have funded a number of similar projects in South Asia, in Mongolia, in Nepal and the MENA region and we have come together as a collective to discuss issues around the ethics and the sustainability of the projects. We have engaged with representatives of our funders trying to explain that what they wanted initially, which is full Open Access, may not be practicable. In fact, there has already been a change in the terminology that is used by the funders. From Open Access, they changed the concept to Public Access, and they have come back to us to say that they can change their contractual terms to be more nuanced and acknowledge the fact that we are in negotiation with national stakeholders and other stakeholders about what should happen to the data. Some of this has been articulated in various meetings, but some of it was trial and error on our side. In other words, with our new proposal for renewal of funding, which was approved, we just included these nuances in the proposal and in our commitment and they were accepted. So in the course of the past four years, through lobbying of the funded projects, we have been able to bring nuance to the way in which the funders themselves think about Open Access.

Stay tuned for part two of this conversation where Stefania will share some of the challenges of managing research data that are located in different countries!

Unlocking Research

Open Research at Cambridge