Category Archives: Open Research at Cambridge Conference

Mapping the world through data – The November 2023 Data Champion Forum 

The November Data Champion forum was a geography/geospatial data themed edition of the bi-monthly gathering, this time hosted by the Physiology department. As usual, the Data Champions in attendance were treated to two presentations. Up first was Martin Lucas-Smith from the Department of Geography who introduced the audience to the OpenStreetMap (OSM) project, a global community mapping project using crowdsourcing. Just as Wikipedia is for textual information, OSM results in a worldwide map created by everyday people who map the world themselves. The resulting maps can vary in terms of its focus such as the transport map, which is a map which shows public transport lanes like railways, buses and trams worldwide, and the humanitarian map, which is an initiative dedicated to humanitarian action through open mapping. Martin is personally involved in a project called CycleStreets which, as the name implies, uses open mapping of bicycle infrastructure. The Department of Geography uses OSM as a background for its Cambridge Air Photos websites. Projects like these, Martin highlighted, demonstrate how community gets generated around open data. 

CycleStreets: Martin at the November 2023 Data Champion Forum

In his presentation, Martin explained the mechanics of OSM such as its data structure, how the maps are edited, and how data can be used in systems like routing engines. Editing the maps and the decision-making processes that go behind how a path is represented visually on the map is the point where the OSM community comes to action. While the data in OSM consists primarily of geometric points (called ‘Nodes’) and lines (called ‘Ways’) coupled with tags which denotes metadata values, the norms about how to define this information can only come about by consensus from the OSM community. This is perhaps different to more formal database structures that might be employed within corporate efforts such as Google. Because of its widespread crowdsourced nature, OSM tends to be more detailed than other maps for less well-served communities such as people cycling or walking, and its metadata is richer, as they are created by people who are intimately familiar with the areas that they are mapping. A map by users for users. 

Next up was Dr Rachel Sippy, a Research Associate with the Department of Genetics who presented how geospatial data factored into epidemiological research. In her work, the questions of ‘who’, ‘when’, and ‘where’ a disease outbreak occurred are important, at it is the where that gives her research a geographical focus. Maps, however, are often not detailed enough to provide information about an outbreak of disease among a population or community as maps can only mark out the incident site, the place, whereas the spatial context of that place, which she denotes as space, is equally as important in understanding disease outbreaks.  

Of ‘Space’ and ‘Place’: Rachel at the November 2023 Data Champion forum

It can be difficult, however, to understand what a researcher is measuring and what types of data can be used to measure space and/or place. Spatial data, as Rachel pointed out, can be difficult to work with and the researcher has to decide if spatial data is a burden or fundamental to the understanding of a disease outbreak in a particular setting. Rachel discussed several aspects of spatial data which she has considered in her research such as visualisation techniques, data sources and methods of analysis. They all come with their own sets of challenges and researchers have to navigate them to decide how best to tell the fundamental story that answers the research question. This essentially comes down to an act of curation of spatial data, as Rachel pointed out, quoting Mark Monmoneir, that “not only is it easy to lie with maps, it’s essential”. In doing so, researchers working with spatial data would have to navigate the political and cultural hierarchies that are explicitly and implicitly inherent to places, and any ethical considerations relating to both the human and non-human (animal) inhabitants of those geographical locations. Ultimately, how data owners choose to model the spatial data will affect the analysis of the research, and with it, its utility for public health. 

After lunch, both Martin and Rachel sat together to hold a combined Q&A session and a discussion emerged around the topic of subjectivity. A question was raised to Rachel regarding mapping and subjectivity, as it was noticed that how she described place, which included socio-cultural meanings and personal preferences of the inhabitants of the place, can be considered to be subjective in manner. Rachel agreed and alluded back to her presentation, where she mentioned that these aspects of mapping can get fuzzy as researchers would have to deal with matters relating to identity, political affiliations and personal opinions, such as how safe an individual may feel in a particular place. Martin added that with the OSM project the data must be objective as possible, yet the maps themselves are subjective views of objective data.  

Rachel and Martin answering questions from the Data Champions at the November 2023 forum

Martin also brought to attention that maps are contested spaces because spaces can be political in nature. Rachel added that sometimes, maps do not appropriately represent the contested nature of her field sites, which she only learned through time on the field. In this way, context is very important for “real mapping”. As an example, Martin discussed his “UK collision data” map, created outside the University, which states where collisions have happened, giving the example of one of central Cambridge’s busiest streets, Mill Road: without contextual information such as what time these collisions occurred, what vehicles were involved, and the environmental conditions at the time of the accident, a collision map may not be that valuable. To this end, it was asked whether ethnographic research could provide useful data in the act of mapping and the speakers agreed. 

Cambridge Open Research Conference 2023: The stage is set

By Nicola Swann and Mandy Wigdorowitz, Office of Scholarly Communication, Cambridge University Libraries

The programme is ready, spaces are nearly full, and we are nearing Cambridge University Libraries’ annual conference on Open Research (OR), taking place at Downing College or online on Friday 17 November 2023. This year’s theme is Open Research for Inclusion: Spotlighting Different Voices in Open Research at Cambridge.

OR is designed to promote equity and inclusion by ensuring that research is accessible to all, regardless of research background, location, or affiliation. The conference will acknowledge that OR can look different in different areas, with the common goal of advancing knowledge and understanding. Giving a voice to OR from diverse perspectives can propel learning, collaboration, and allow us to learn from one another’s approaches to openness.

“The conference looks fantastic! It’s a really fabulous mix of papers and speakers, and really exciting in terms of moving Open Research conversations into different disciplinary practices. It is the first programme I’ve seen that truly integrates research and open into a joint conversation. It’s brilliant!”

Dr Jessica Gardner, University Librarian & Director of Library Services, Cambridge University Libraries

This blog post highlights the speakers who will be joining us on the day and the topics they will explore. We’re delighted to host OR experts who will show the value of open practices in typically under-represented disciplines and contexts. These include the Arts, Humanities, and Social Sciences, the GLAM sector (Galleries, Libraries, Archives, and Museums), and research from and about the Global South.

The day starts with a welcome address from Professor Anne Ferguson-Smith CBE FRS FMedSci, Pro-Vice-Chancellor for Research and International Partnerships and the Arthur Balfour Professor of Genetics, who is a key proponent of OR – see her views about OR on the Office of Scholarly Communication’s (OSC) website. Our Keynote speaker, Dr Siddharth Soni, Isaac Newton Trust Fellow at Cambridge Digital Humanities and affiliated lecturer at the Faculty of English, will then addresses us with a talk on Common Ground, Common Duty: Open Humanities and the Global South, providing an account of how to think against neoliberal conceptions of the ‘open’ and to reimagine what openness might look like if the Global South was viewed as a common ground space for building an open and international university culture.

Dr Stefania Merlo from the McDonald Institute for Archaeological Research and Dr Rebecca Roberts from the McDonald Institute for Archaeological Research and Fitzwilliam Museum will further explore the theme of the Global South in their practical perspective on how they managed the curation of digital archives for heritage management from their work on the projects: Mapping Africa’s Endangered Archaeological Sites and Monuments (MAEASaM) and Mapping Archaeological Heritage in South Asia (MAHSA).

We will then change pace with an OR panel session comprising panellists with diverse backgrounds and expertise who will address registrants’ pre-submitted questions. There will be engaging insights and debate amongst the panellists, led by Bertrand Russell Professor of Philosophy, Professor Alexander Bird. He will share the platform with Philosophy of Science Professor, Professor Anna Alexandrova, Psychiatry PhD student Luisa Fassi, Cambridge University Libraries (CUL) Interim Head of Open Research Services Dr Sacha Jones, Cambridge University Press & Assessment’s Research Data Manager Dr Kiera McNeice, and Cambridge’s Head of Research Culture Liz Simmonds.

The spotlight switches to the GLAM sector in the afternoon, with a second panel chaired by CUL’s Scholarly Communication Specialist Dr Samuel Moore. This panel brings together experts who will showcase their diverse work in the GLAM sector, from software development and museum practices to infrastructure and archiving support. The panel includes Dr Mary Chester-Kadwell, CUL’s Senior Software Developer and Lead Research Software Engineer at Cambridge Digital Humanities, Isaac Newton Trust Research Associate in Conservation Dr Ayesha Fuentes from the Museum of Archaeology and Anthropology, Dr Agustina Martinez-Garcia, CUL’s Head of OR Systems, and Dr Amelie Roper, CUL’s Head of Research. They will each expand on specialist areas, including OR code and data practices in digital humanities, collections research, teaching and learning collections care, and OR infrastructure. 

In a workshop session, Tim Fellows, Product Manager for Octopus, will outline how Octopus is an alternative publishing model that can foster OR. To round the day off, Professor Joanna Page, Director of CRASSH and Professor of Latin American Studies, will present on the considerations of OR and the coloniality of knowledge with a specific focus on the questions of possession and access.

In true Cambridge tradition, a drinks reception will bring the event to a close, allowing attendees a chance to mingle and continue the discussions.

Make sure to book your place so you don’t miss out. Take a look at the programme to register and join researchers, students, librarians, administrators, and publishers across the University of Cambridge at every career stage. Get in touch if you have any queries info@osc.cam.ac.uk.

The Data Picture

I was recently named one of “the next generation of [library] leaders” as part of the CILIP 125, having been recognised as an individual who contributes energy and knowledge to improving and impacting their organisation. My area of expertise, and thus recognition, lies with the use of data within libraries. As a data analyst for the Office of Scholarly Communications at Cambridge University Library, my role focuses on empowering decisions with data driven understanding – such as supporting the Springer Nature negotiations. To develop my understanding of data, and its role within a wider organisation, further, I engage with data beyond the library – such as the Big Data London conference and the Carruthers and Jackson Data Leaders’ Summer School. Reflecting on the use of data in the wider world, what can be expected of the library and data?


The summer school provided practical advice, proven methodologies, and guidance that could apply across a variety of businesses. The course is designed to provide insight on the workflow of data officers, and their role within an organisation – no matter its stage of data maturity and literacy. Over the course of the ten weeks, leading experts discussed the role of a chief data officer (CDO), both as a business development opportunity, and as a career path for individuals. It explored the risk and governance of data within an organisation, and the final weeks focused strongly on the role of people and teams associated with data.

Peter Jackson and Caroline Carruthers addressed the differing types of CDO and described a pendulum between ‘risk aversion’ and ‘value added’. Understanding the balance between secure and proper data governance (GDPR for example) and providing value through data (such as setting up automation). The pendulum of risk to reward is relevant to many roles, including those within the library. Understanding the need to divide time and energy between creating policies and getting decision making results, is just as relevant to my role as a chief data officer. In my role I have supported decision making staff through data production, but equally, to instil a culture of data, time and energy must be dedicated to risk aversion, through tasks of researching data management, preparing training sessions for data storage, and supporting staff in data preparation.

Another important concept introduced was the DIKW pyramid – Data, Information, Knowledge, Wisdom – for understanding the value created from data. The base of the pyramid is (raw) Data, which can be processed into (useful) Information. This Information is data with meaning and a purpose and can be organised into (insightful) knowledge. Knowledge combines experiences, values, insights, and contextual information, which can then transcend to (integral) Wisdom. Wisdom is considered a deeper understanding with ethical implications and the ability to define ‘why’. The DIKW pyramid provided a frame of thought for presenting and approaching future data projects. Understanding the requirement to provide, data, information or knowledge, to better support a decision-making team.

To develop communication skills, expert Scott Taylor, known as The Data Whisperer, spoke about the three V’s for data storytelling: Vocabulary, Voice and Vision. Combining an accessible vocabulary, with a common voice will illuminate the business vision, and why that is important. This overarching concept for an organisations data approach can be scaled down to support individual data workers, to provide value – which should either grow, improve or protect the business case. Understanding how to communicate the data is a key skill as “Hardware comes and goes, software comes and goes, but data remains”. And that data that remains should be used to either grow, improve or protect the business, such that data gathered should be usable data!

At Big Data London, the organisation Women in Data hosted conversations about nurturing a culture of learning within data teams. Pulling from their experiences from minority backgrounds, the speakers highlighted the power in upskilling, sharing skills across teams and being an advocate on oneself and skills. As for what to upskill, data literacy was a hot topic across the conference. Data literacy, also called data fluency and data confidence, is the combination of ability, skills and confidence surround data and its uses. Data literacy enables more efficient work, and begs the question, what is the base level of data literacy / confidence across the library? Librarians use data daily; checking in/out material, answering students’ queries, or tracking the use of space, but are all librarians confident to use that data? This is an area I hope to explore further at the CUL, to ensure staff can use the data they have to support decisions.


Engaging with the world of data provides a big picture of the possibilities within the library. Conversations of AI (Artificial Intelligence), data policies and maturity, and shiny-new databases, software, and services, demonstrate the growing adoption of data, and therefore, libraries should follow suit. Actively taking snippets of larger conversations, developing ideas within the library space, and exploring the possibilities with data will help libraries thrive in this world of technological growth.