Tag Archives: research data management

Data Diversity Podcast (#4) – Dr Stefania Merlo (1/2) 

Welcome back to the fourth instalment of Data Diversity, the podcast where we speak to Cambridge University Data Champions about their relationship with research data and highlight their unique data experiences and idiosyncrasies in their journeys as a researcher. In this edition, we speak to Data Champion Dr Stefania Merlo from the McDonald Institute of Archaeological Research, the Remote Sensing Digital Data Coordinator and project manager of the Mapping Africa’s Endangered Archaeological Sites and Monuments (MAEASaM) project and coordinator of the Metsemegologolo project. This is the first of a two-part series and in this first post, Stefania shares with us her experiences of working with research data and outputs that are part of heritage collections, and how her thoughts about research data and the role of the academic researcher have changed throughout her projects. She also shares her thoughts about what funders can do to ensure that research participants, and the data that they provide to researchers, can speak for themselves.   

This is the first of a two-part series and in this first post, Stefania shares with us her experiences of working with research data and outputs that are part of heritage collections, and how her thoughts about research data and the role of the academic researcher have changed throughout her projects. She also shares her thoughts about what funders can do to ensure that research participants, and the data that they provide to researchers, can speak for themselves.   


I’ve been thinking for a while about the etymology of the word data. Datum in Latin means ‘given’. Whereas when we are collecting data, we always say we’re “taking measurements”. Upon reflection, it has made me come to a realisation that we should approach data more as something that is given to us and we hold responsibility for, and something that is not ours, both in terms of ownership, but also because data can speak for itself and tell a story without our intervention – Dr Stefania Merlo


Data stories (whose story is it, anyway?) 

LO: How do you use data to tell the story that you want to tell? To put it another way, as an archaeologist, what is the story you want to tell and how do you use data to tell that story?

SM: I am currently working on two quite different projects. One is Mapping Africa’s Endangered Archaeological Sites and Monuments (funded by Arcadia) which is funded to create an Open Access database of information on endangered archaeological sites and monuments in Africa. In the project, we define “endangered” very broadly because ultimately, all sites are endangered. We’re doing this with a number of collaborators and the objective is to create a database that is mainly going to be used by national authorities for heritage management. There’s a little bit less storytelling there, but it has more to do with intellectual property: who are the custodians of the sites and the custodians of the data? A lot of questions are asked about Open Access, which is something that the funders of the projects have requested, but something that our stakeholders have got a lot of issues with. The issues surround where the digital data will be stored because currently, it is stored in Cambridge temporarily. Ideally all our stakeholders would like to see it stored in a server in the African continent at the least, if not actually in their own country. There are a lot of questions around this. 

The other project stems out of the work I’ve been doing in Southern Africa for almost the past 20 years, and is about asking how do you articulate knowledge of the African past that is not represented in history textbooks? This is a history that is rarely taught at university and is rarely discussed. How do you avail knowledge to publics that are not academic publics? That’s where the idea of creating a multimedia archive and a platform where digital representations of archaeological, archival, historical, and ethnographic data could be used to put together stories that are not the mainstream stories. It is a work in progress. The datasets that we deal with are very diverse because it is required to tell a history in a place and in periods for which we don’t have written sources.  

It’s so mesmerizing and so different from what we do in contexts where history is written. It gives us the opportunity to put together so many diverse types of sources. From oral histories to missionary accounts with all the issues around colonial reports and representations of others as they were perceived at the time, putting together information on the past environment combining archaeological data. We have a collective of colleagues that work in universities and museums. Each performs different bits and pieces of research, and we are trying to see how we would put together these types of data sets. How much do we curate them to avail them to other audiences? We’ve used the concept of data curation very heavily, and we use it purposefully because there is an impression of the objectivity of data, and we know, especially as social scientists, that this just doesn’t exist. 

I’ve been thinking for a while about the etymology of the word data. Datum in Latin means ‘given’. Whereas when we are collecting data, we always say we’re taking measurements. Upon reflection, it has made me come to a realisation that we should approach data more as something that is given to us and we hold responsibility for, and something that is not ours, both in terms of ownership, but also because data can speak for itself and tell a story without our intervention. That’s the kind of thinking surrounding data that we’ve been going through with the project. If data are given, our work is an act of restitution, and we should also acknowledge that we are curating it. We are picking and choosing what we’re putting together and in which format and framework. We are intervening a lot in the way these different records are represented so that they can be used by others to tell stories that are perhaps of more relevance to us. 

So there’s a lot of work in this project that we’re doing about representation. We are explaining – not justifying but explaining – the choices that we have made in putting together information that we think could be useful to re-create histories and tell stories. The project will benefit us because we are telling our own stories using digital storytelling, and in particular story mapping, but it could become useful for others as resources that can be used to tell their own stories. It’s still a work in progress because we also work in low resourced environments. The way in which people can access digital repositories and then use online resources is very different in Botswana and in South Africa, which are the two countries where I mainly work with in this project. We also dedicate time into thinking how useful the digital platform will be for the audiences that we would like to get an engagement from. 

The intended output is an archive that can be used in a digital storytelling platform. We have tried to narrow down our target audience to secondary school and early university students of history (and archaeology). We hope that the platform will eventually be used more widely, but we realised that we had to identify an audience to be able to prepare the materials. We have also realised that we need to give guidance on how to use such a platform so in the past year, we have worked with museums and learnt from museum education departments about using the museum as a space for teaching and learning, where some of these materials could become useful. Teachers and museum practitioners don’t have a lot of time to create their own teaching and learning materials, so we’re trying to create a way of engaging with practitioners and teachers in a way that doesn’t overburden them. For these reasons, there is more intervention that needs to come from our side into pre-packaging some of these curations, but we’re trying to do it in collaboration with them so that it’s not something that is solely produced by us academics. We want this to be something that is negotiated. As archaeologists and historians, we have an expertise on a particular part of African history that the communities that live in that space may not know about and cannot know because they were never told. They may have learned about the history of these spaces from their families and their communities, but they have learned only certain parts of the history of that land, whereas we can go much deeper into the past. So, the question becomes, how do you fill the gaps of knowledge, without imposing your own worldview? It needs to be negotiated but it’s a very difficult process to establish. There is a lot of trial and error, and we still don’t have an answer. 

Negotiating communities and funders 

LO: Have you ever had to navigate funders’ policies and stakeholder demands?  

SM: These kinds of projects need to be long and they need continuous funding, but they have outputs that are not always necessarily valued by funding bodies. This brings to the fore what funding bodies are interested in – is it solely data production, as it is called, and then the writing up of certain academic content? Or can we start to acknowledge that there are other ways of creating and sharing knowledge? As we know, there has been a drive, especially with UK funding bodies, to acknowledge that there are different ways in which information and knowledge is produced and shared. There are alternative ways of knowledge production from artistic ones to creative ones and everything in between, but it’s still so difficult to account for the types of knowledge production that these projects may have. When I’m reporting on projects, I still find it cumbersome and difficult to represent these types of knowledge production. There’s so much more that you need to do to justify the output of alternative knowledge compared to traditional outputs. I think there needs to be change to make it easier for researchers that produce alternative forms of knowledge to justify it rather than more difficult than the mainstream. 

One thing I would say is there’s a lot that we’ve learned with the (Mapping Africa’s Endangered Archaeological Sites and Monuments) project because there we engage directly with the custodians of the site and of the analog data. When they realise that the funders of the project expect to have this data openly accessible, then the questions come and the pushback comes, and it’s a pushback on a variety of different levels. The consequence is that basically we still haven’t been able to finalise our agreements with the custodians of the data. They trust us, so they have informed us that in the interim we can have the data as a project, but we haven’t been able to come to an agreement on what is going to happen to the data at the end of the project. In fact, the agreement at the moment is the data are not going to be going on a completely Open Access sphere. The negotiation now is about what they would be willing to make public, and what advantages they would have as a custodian of the data to make part, or all, of these data public.

This has created a disjuncture between what the funders thought they were doing. I’m sure they thought they were doing good by mandating that the data needs to be Open Access, but perhaps they didn’t consider that in other parts of the world, Open Access may not be desirable, or wanted, or acceptable, for a variety of very valid reasons. It’s a node that we still haven’t resolved and it makes me wonder: when funders are asking for Open Access, have they really thought about work outside of UK contexts with communities outside of the UK context? Have they considered these communities’ rights to data and their right to say, “we don’t want our data to be shared”? There’s a lot of work that has happened in North America in particular, because indigenous communities are the ones that put forward the concept of C.A.R.E., but in UK we are still very much discussing F.A.I.R. and not C.A.R.E.. I think the funders may have started thinking about it, but we’re not quite there. There is still this impression that Open Data and Open Access is a universal good without having considered that this may not be the case. It puts researchers that don’t work in UK or the Global North in an awkward position. This is definitely something that we are still grappling with very heavily. My hope is that this work is going to help highlight that when it comes to Open Access, there are no universals. We should revisit these policies in light of the fact that we are interacting with communities globally, not only those in some countries of the world. Who is Open Access for? Who does it benefit? Who wants it and who doesn’t want it, and for what reasons? These are questions that we need to keep asking ourselves. 

LO: Have you been in a position where you had to push back on funders or Open Access requirements before? 

Not necessarily a pushback, but our funders have funded a number of similar projects in South Asia, in Mongolia, in Nepal and the MENA region and we have come together as a collective to discuss issues around the ethics and the sustainability of the projects. We have engaged with representatives of our funders trying to explain that what they wanted initially, which is full Open Access, may not be practicable. In fact, there has already been a change in the terminology that is used by the funders. From Open Access, they changed the concept to Public Access, and they have come back to us to say that they can change their contractual terms to be more nuanced and acknowledge the fact that we are in negotiation with national stakeholders and other stakeholders about what should happen to the data. Some of this has been articulated in various meetings, but some of it was trial and error on our side. In other words, with our new proposal for renewal of funding, which was approved, we just included these nuances in the proposal and in our commitment and they were accepted. So in the course of the past four years, through lobbying of the funded projects, we have been able to bring nuance to the way in which the funders themselves think about Open Access. 


Stay tuned for part two of this conversation where Stefania will share some of the challenges of managing research data that are located in different countries!


Data Diversity Podcast #2 – Dr Alfredo Cortell-Nicolau

In our second instalment of the Data Diversity Podcast, we are joined by archaeologist Dr Alfredo Cortell-Nicolau, a Senior Teaching Associate in Quantitative and Computational Methods in Archaeology and Biological Anthropology at the McDonald Institute for Archaeological Research and Data Champion.

As is the theme of the podcast, we spoke to Alfredo about his relationship with data and learned from his experiences as a researcher. The conversation also touched on the different interpersonal, and even diplomatic, skills that an archaeologist must possess to carry out their research, and how one’s relationship with individuals such as landowners and government agents might impact their access to data. Alfredo also sheds light on some of the considerations that archaeologists must go through when storing physical data and discussed some ways that artificial intelligence is impacting the field. Below are some excerpts from the conversation, which can be listened to in full here.

I see data in a twofold way. This implies that there are different ways to liaise with the data. When you’re talking about the actual arrowhead or the actual pot, then you would need to liaise with all the different regional and national laws regarding heritage and how they want you to treat the data because it’s going to be different for every country and even for every region. Then, of course, when you’re using all these morphometric information, all the CSV files, the way to liaise with the data becomes different. You have to think of data in this twofold way.

Dr Alfredo Cortell-Nicolau

Lutfi Othman (LO): What is data to you?

Alfredo Cortell-Nicolau (ACN): In archaeology in general, there are two ways to see the data. In my case for example, one way to see it is that the data is as the arrowhead and that’s the primary data. But then when I conduct my studies, I extract lots of morphometric measures and I produce a second level of data, which are CSV files with all of these measurements and different information about the arrowheads. So, what is the data? Is it the arrowhead or is it the file with information about the arrowhead? This raises some issues in terms of who owns the data and how you are going to treat the data because it’s not the same. In my case, I always share my data and make everything reproducible. But when I share my data, I’m sharing the data that I collected from the arrowheads. I’m not sharing the arrowheads because they are not mine to share.

This is kind of a second layer of thought when you’re working with Archaeology. When you’re studying, for example, pottery residues, then you’re sharing the information of the residues and not the pot that you used to obtain those residues. There are two levels of data. Which is the actual data itself? The data which can be reanalyzed in different ways by different people, or the data that you extracted only for your specific analysis? I see data in this twofold way. This implies that there are different ways to liaise with the data. When you’re talking about the actual arrowhead or the actual pot, then you would need to liaise with all the different regional and national laws regarding heritage and how they want you to treat the data because it’s going to be different for every country and even for every region. Then, of course, when you’re using all these morphometric information, all the CSV files, the way to liaise with the data becomes different. You have to think of data in this twofold way.

On some of the barriers to sharing of archaeological data

ACN: There are some issues in how you would acknowledge that the field archaeologist is the one who got the data. Say that you might have excavated a site in the 1970s and some other researcher comes later, and they may be doing many publications after that excavation, but you are not always giving the proper attribution to the field archaeologist because you cited the first excavation in the first publication, and you’re done. Sometimes, that makes field archaeologists reluctant to share the data because they don’t feel that their work is acknowledged enough. This is one issue which we need to try to solve. Take for example a huge radiocarbon database of 5000 dates: if I use that database, I will cite whoever produced that database, but I will not be citing everyone who actually contributed indirectly to that database. How do I include all of these citations? Maybe we can discuss something like meta-citations, but there must be some way in which everyone feels they are getting something out of sharing the data. Otherwise, there might be a reaction where they think “well, I just won’t share. There’s nothing in for me to share it so why should I share my data”, which would be understandable.

On dealing with local communities, archaeological site owners and government officials

ACN: When we have had to deal with private owners, local politicians and different heritage caretakers, not everyone feels the same way. Not everyone feels the same way about everything, and you do need a lot of diplomatic skills to navigate through this because to excavate the site you need all kinds of permits. You need the permit of the owner of the site, the municipality, the regional authorities, the museum where you’re going to store the material. You need all of these to work and you need the money, of course. Different levels of discussion with indigenous communities is another layer of complexity which you have to deal with. In some cases, like in the site where we’re excavating now, the owner is the sweetest person in the world, and we are so lucky to have him. I called him two days ago because we were going to go to the site, and I was just joking with him, saying I’ll try not to break anything in your cave, and he was like, “this is not my cave. This is heritage for everyone. This is not mine. This is for everyone to know and to share”. It is so nice to find people like that. That may happen also with some kinds of indigenous communities. The levels of politics and negotiation are probably different in every case.

On how archaeologists are perceived

LO: When you approach a field or people, how do they view the archaeologists and the work?

ACN: It really depends on the owner. The one that we’re working with now, he’s super happy because he didn’t know that he had archaeology in his cave. When we told him, he was happy because he’s able to bring something to the community and he wants his local community to be aware that there is something valuable in terms of heritage. This is one good example. But we have also had other examples, for instance, where the owner of the cave was a lawyer and the first thing he thought was “are there going to be legal problems for me? If something happens in the cave, who’s the legal responsibility.” In another case there was there was another person that just didn’t care, she said “you want to come? Fine. The field is there, just do whatever you want.” So, there are different sensibilities to this. Some people are really happy about the heritage and don’t see it as a nuisance that they have to deal with. 

LO: How about yourself as a researcher, archaeologist: do you see yourself as the custodian of sorts, or someone who’s trying to contribute to this or local heritage for the place? Or is it almost scientific and you’re there to dig.

ACN: When I approach the different owners, I think the most important thing is to let them know that they have something valuable to the local community and they can be a part of that. They can be a part of being valuable to the local community. Also, you must make it clear that it’s not going to be a nuisance for them and they don’t have to do anything. I think the most important part is letting them know how it can be valuable for the community. I usually like them to be involved, and they can come and see the cave and see what we are doing. In the end it’s their land and if they see that we are producing something that is valuable to the community then it is good for them. In this case, the type of data that we produce is the primary type of data, that is, the actual different pottery sherds, the different arrowheads, etcetera. In this current excavation, we got an arrowhead that is probably some 4- or 5000 years old and you get (the land owners) to touch this arrowhead that no one in 5000 years has seen. If you can get the owners to think of it in this way, that they’re doing something valuable for your community, then they will be happier to participate in this whole thing and to just let us do whatever we want to do, which is science.

LO: How do you store physical data? Or do you let the landowner store it?

ACN: That depends on the national and regional laws and different countries have different laws about this. The cave where I’m working right now is in Spain, so I’m going to talk about the Spanish law, which is the one that I that I follow and it’s going to be different depending on every country. In our case, with the different assemblages that you find, you have a period of up to 10 years where you can store them yourself in your university and that period is for you to do your research with them. After that period, it goes to whichever museum they are supposed to be going, which depends on the law that says that it has to be the museum that is the closest to the cave or site where they were excavated. Here, the objects can then be displayed and the museum is the ones responsible for managing them, and storing them long term.

There is one additional thing: If you are excavating a site that has already been excavated, then there is a principle of keeping the objects and assemblages together. For example, there is this cave that was excavated in the 1950s and they store all the assemblages in the Museum of Prehistory of Valencia, which was the only museum in the whole region. Now, they excavated it again a few years ago and now there are museums that are closer to the cave but because the bulk of the assemblages are in Valencia and they don’t want to have it separated in two museums, they still have to go to Valencia. This is the principle of not having the assemblages separated and it is the most important one.


As always, we learn so much by engaging with our researchers about their relationship with data, and we thank Alfredo for joining us for this conversation. Please let us know how you think the podcast is going and if there are any question relation to research data that you would like us to ask!

The September 2023 Data Champion Forum

The Cambridge Data Champions had a fantastic September Forum at the West Hub. The forum started with an introduction to the West Hub by  Library Manager Daniele Campello and we welcomed Clair Castle as the new interim Research Data Manager with the Office of Scholarly Communication (University Library).

Dr Mandy Wigdorowitz kicked off the presentations by sharing with the Data Champions what she aims to achieve as the University’s Open Research Community Manager. This includes raising the profile of Open Research at the University and ensuring that scholarly and research outputs that are deemed to be open are indeed accessible and interoperable in accordance with FAIR principles.  As Open Research Community Manager, Mandy advocates for Open Research among University researchers from both the STEMM and AHSS (Art, Humanities and Social Sciences) disciplines. The latter proves to be more challenging as researchers in AHSS may often have valid reasons from refraining from making their research data open, such as working with sensitive data or working with interlocutors who object to their data being shared. Such issues will be addressed at the Cambridge Open Research Conference that she is organising, which takes place on 17th November 2023 at Downing College, Cambridge as well as online. To end, Mandy invited the Data Champions to join her Open Research initiative, a community of advocates for Open Research across the University.

Before lunch, Madeleine Taylor (Information Security Risk and Governance Manager with University Information Services, UIS) presented a follow up to a webinar session on monitoring the Information and Cybersecurity (ICS) risks for research data across the university, which she conducted with the Data Champions a couple weeks prior. After a brief introduction of what she has done so far to protect Cambridge’s research communities against ICS threats, she asked the Data Champions for help in her task of securing research data against ICS risks. They can do so by providing her with a sense of what data their own research communities are working with and how they were storing them. As the Data Champions ate the delicious lunch of sandwiches and cakes provided by the West Hub caterers, they provided feedback to Madeleine on two forms that she proposed as methods of gathering the information she needed: a 3-minute research data impact assessment form and a research data cyber security risk form. Maddy will continue to work with the Research Data Team and the Data Champions to refine, and gather information, through these forms.

Thank you to the West Hub and Daniele Campello for hosting the Data Champions Forum in your welcoming building!

If you are a member of the University of Cambridge and are interested in attending the Data Champions Forum, please join us as a Data Champion. If you are passionate about research data management and data sharing or you would like to find out more about what being a Data Champion entails, please visit the Data Champions webpage. We welcome applications from those working in all academic subjects across AHSS and STEMM disciplines. If you are unsure about how being a Data Champion would impact your research, please get in touch with the Research Data Team!

Cartoon by Clare Trowell CC-BY-NC-ND