Dr. Sacha Jones and Dr. Samuel Moore, Office of Scholarly Communication, Cambridge University Libraries
The Open Research at Cambridge conference took place between 22–26 November 2021. In a series of talks, panel discussions and interactive Q&A sessions, researchers, publishers, and other stakeholders explored how Cambridge can make the most of the opportunities offered by open research. This blog is part of a series summarising each event.
As part of the Cambridge Open Research conference, the Office of Scholarly Communication hosted a ‘101’ session on open research, covering the basics and answering queries for the audience on all aspects of open access publication and open data. With over 80 participants, we were thrilled with the response and wanted to recap some of the topics we covered in this post.
Firstly, as we discussed in the session, it is easy to assume that open research is simply an issue for the sciences rather than all academic disciplines. Practices such as open access and open data have been taken up widely in the sciences, although in different ways, and there is a common association with science and openness. This is compounded by the fact that in many European countries Open Science is inclusive of arts and humanities scholarship and so is functionally equivalent to open research. At the OSC, we are keen to support open practices across all disciplines while being sensitive to different ways of working. We are guided by the university’s Open Research Position Statement that requires work to be ‘as open as possible, as closed as necessary’.
After an introduction to open research, Sam then outlined the key issues in open access, including the different licences for making your research open access, the differences between green and gold open access, and the many and various reasons for making your work open access. Open access allows us to reach new audiences, improve the economics of research access, and reassess knowledge production and dissemination in a digital world. We also learned about open access monographs, the complex policy landscape and the various ways in which you can make your research open access through repositories and journals. The OSC’s Open Access webpages are an excellent set of resources for learning more.
We then moved onto open data – research data shared publicly – and how this fits into open research (see the University’s policy framework on research data). After highlighting that all research regardless of discipline generates or uses data of one kind or another (e.g. text, audio-visual, numerical, etc.), Sacha posed a series of questions with answers, anticipating what the audience might want to know more about. Do I have to share my data? What data do I share – is it meant to be everything from my research? My data contains sensitive information so I can’t share my data, or can I? How do I share my data? I don’t want to be criticised after making my data open, so how can I prevent this? How can I stop someone else from taking my data, using it, and getting all the credit? The OSC’s Research Data website contain information about data management and data sharing, and check out our list of Cambridge Data Champion experts to see if there’s anyone who’s volunteered to be a local source of data-related advice in your department or discipline.
We are always available as a source of support and guidance in all matters relating to open research and encourage you to contact us if you have any questions. The OSC has webpages on open research and sites dedicated to both open access and research data. For general open research enquires, we can be emailed at info@osc.cam.ac.uk, for open access at info@openaccess.cam.ac.uk and for data at info@data.cam.ac.uk. There are also a number of training sessions provided throughout the year and online that relate to the topics covered in this session. If you think that those in your department or institute at Cambridge would like to know more about the topics covered here then please do get in touch as we’d be happy to speak to these and answer any questions you may have.
This year we have continued, as always, to provide support and services for researchers to help with their research data management and open data practices. So far in 2020, we have approved more than 230 datasets into our institutional repository, Apollo. This includes Apollo’s 2000th dataset on the impact of health warning labels on snack selection, which represents a shining example of reproducible research, involving the full gamut: preregistration, and sharing of consent forms, code, protocols, data. There are other studies that have sparked media interest for which the data are also openly available in Apollo, such as the data supporting research that reports the development of a wireless device that can convert sunlight, carbon dioxide and water into a carbon-neutral fuel. Or, data supporting a study that has used computational modelling to explain why blues and greens are the brightest colours in nature. Also, and in the year of COVID, a dataset was published in April on the ability of common fabrics to filter ultrafine particles, associated with an article in BMJ Open. Sharing data associated with publications is critical for the integrity of many disciplines and best practice in the majority of studies, but there is also an important responsibility of science communication in particular to bring research datasets to the forefront. This point was discussed eloquently this summer in a guest blog post in Unlocking Research by Itamar Shatz, a researcher and Cambridge Data Champion. Making datasets open permits their reuse, and if you have wondered how research data is reused and then read this comprehensive data sharing and reuse case study written by the Research Data team’s Dominic Dixon. This centres on the use and value of the Mammographic Image Society database, published in Apollo five years ago.
This year has seen the necessary move from our usual face-to-face Research Data Management (RDM) training to provision of training online. This has led us to produce an online training session in RDM, covering topics such as data organisation, storage, back up and sharing, as well as data management plans. This forms one component of a broader Research Skills Guide – an online course for Cambridge researchers on publishing, managing data, finding and disseminating research – developed by Dr Bea Gini, the OSC’s training coordinator. We have also contributed to a ‘Managing your study resources’ CamGuide for Master’s students, providing guidance on how to work reproducibly. In collaboration with several University stakeholders we released last month new guidance on the use of electronic research notebooks (ERNs), providing information on the features of ERNs and guidance to help researchers select one that is suitable.
At the start of this year we invited members of the University to apply to become Data Champions, joining the pre-existing community of 72 Data Champions. The 2020 call was very successful, with us welcoming 56 new Data Champions to the programme. The community has expanded this year, not only in terms of numbers of volunteers but also in terms of disciplinary focus, where there are now Data Champions in several areas of the arts, humanities and social sciences in particular where there were none previously. During this year, we have held forums in person and then online, covering themes such as how to curate manual research records, ideas for RDM guidance materials, data management in the time of coronavirus, and data practices in the arts and humanities and how these can be best supported. We look forward to further supporting and advocating the fantastic work of the Cambridge Data Champions in the months and years to come.
Itamar Shatz has written a guest blog post for the Office of Scholarly Communication about how public trust in the scientific community increases when researchers make their data openly available to all. He also emphasizes that science communicators (e.g. press offices, journalists, publishers) have a responsibility to point attention directly at the primary source of the data. Itamar is a PhD candidate in the Department of Theoretical and Applied Linguistics at the University of Cambridge. He is also a member of the Cambridge Data Champion programme, having joined at the start of this year. He writes about science and philosophy that have practical applications at Effectiviology.com.
It’s no secret that the public’s view of the
scientific community is far from ideal.
For example, a global survey published by the Wellcome Trust in 2019 showed that, on average, only 18% of people indicate that they have a high level of trust in scientists. Furthermore, the survey showed that there are stark differences between people living in different areas of the world; for instance, this rate was more than twice as high in Northern Europe (33%) and Central Asia (32%) than in Eastern Europe (15%), South America (13%), and Central Africa (12%).
Things do appear to be improving, to some degree, especially in light of the recent pandemic. For example, a recent survey in the UK, conducted by the Open Knowledge Foundation, has found that, following the COVID-19 pandemic, 64% of people are now “more likely to listen expert advice from qualified scientists and researchers”. Similar increases in public confidence have been found in other countries, such as Germany and the USA. However, despite these recent increases, there is still much room for improvement.
Open data can help increase the public’s confidence in
scientists
The public’s lack of confidence in
scientists is a complex, multifaceted issue, that is unlikely to be resolved by
a single, neat solution. Nevertheless, one thing that can help alleviate this
issue to some degree is open data, which is the practice of making data
from scientific studies publicly accessible.
Research on the topic shows just how powerful this tool can be. For example, the recent survey by the Open Knowledge Foundation, conducted in the UK in response to the COVID-19 pandemic, found that 97% of those polled believed that it’s important for COVID-19 data to be openly available for people to check, and 67% believed that all COVID-19 related research and data should be openly available for anyone to use freely. Similarly, a 2019 US survey conducted before the pandemic found that 57% of Americans say that they trust the outcomes of scientific studies more if the data from the studies is openly available to the public.
Overall, such surveys strongly suggest that
open data can help increase the public’s trust in scientists. However, it’s not
enough for studies to just have open data for it to increase the
public’s trust; if people don’t know about the open data, or if don’t fully understand
what it means, then open data is unlikely to be as beneficial as it could be.
As such, in the following section we will see some guidelines on how to properly
incorporate open data into science communication, in order to utilize this tool
as effectively as possible.
How to incorporate open data into science communication
To properly incorporate open data into science
communication, there are several key things that people who engage in science
communication—such as journalists and scientists—should generally do:
Say that the study has open data. That
is, you should explicitly mention that the researchers have made the data from
their research openly available. Do not assume that people will go to
the original study and then learn there about the data being open.
Explain what open data is. That is, you should briefly explain what it means for the data to
be openly available, and potentially also mention the benefits of making the
data available, for example in terms of making research more transparent, and
in terms of helping other researchers reproduce the results.
Describe what sort of data
has been made openly available. For example, you
can include descriptions of the type of data involved (surveys, clinical
reports, brain scans, etc.), together with some concrete examples that help the
audience understand the data.
Explain where the data can
be found. For example, this can be in the article’s
“supplementary information” section, though data should preferably be available
in a repository where the dataset has its own persistent identifier, such as a
DOI. This ensures that the audience can find and access the data, which may
otherwise be hidden behind a paywall, and offers other benefits, such as
allowing researchers to directly access and cite the dataset, without navigating
through the article.
These practices can help people better
understand the concept of open data, particularly as it pertains to the study
in question, and can help increase their trust in the openness of the data,
especially if it is placed somewhere that they can access themselves.
For one example of how open data might be
communicated effectively in a press release, consider the following:
“The researchers have made all the data from this study openly available; this means that all the results from their experiments can be freely accessed by anyone through a repository available at: https://www.doi.org/10.xxxxx/xxxxxxx. This can help other scientists verify and reproduce their results, and will aid future research on the topic.”
Open data in different types of scientific communications
It’s important to note that there’s no
single right way to incorporate open data into scientific communications. This
can be attributed to various factors, such as:
Differences between fields
(e.g. biology, economics, or psychology)
Differences between types
of studies (e.g. computational or experimental)
Differences between media
(e.g. press release or social media post).
Nevertheless, the guidelines outlined
earlier can be beneficial as initial considerations to take into account when
deciding how to incorporate open data into science communication. It is up to
communicators to make the final modifications, in order to use open data as
effectively as possible in their particular situation.
Summarizing what we’ve learned
Though the public’s trust in science is currently growing, there is much room for improvement. One powerful tool that can aid the academic community is open data—the practice of making data from research studies openly available. However, to benefit as much as possible from the presence of open data, it’s not sufficient for a study to merely make its data open. Rather, the accessibility of the data needs to be promoted and explained in scientific communication, and the dataset needs to be cited appropriately (see the Joint Declaration of Data Citation Principles for guidelines regarding this latter point).
What is currently being done
It is important to note that much work is already being done to promote the concept of open data. For example, organizations such as the Research Data Alliance promote discussion of the topic and publish relevant material, as in the case of their recent guidelines and recommendations regarding COVID-19 data.
In addition, at the University of Cambridge, in particular, we can already see a substantial push for open data practices, where appropriate, and from many angles as outlined in the University’s Open Research position statement. Many funding bodies mandate that data be made available, and the University facilitates the process of sharing the data via Apollo, the institutional repository. Furthermore, there are the various training courses and publications—including this very blog—led by bodies such as the Office of Scholarly Communication (OSC), which help to promote Open Research practices at the University. Most notably, there is the OSC’s Data Champion programme, which deals, among other things, with supporting researchers with open data practices.
Moving forward
Promoting the use of open data in scientific
communication is something that different stakeholders can do in different
ways.
For example, those engaging in science
communication—such as journalists and universities’ communication offices—can
mention and explain open data when covering studies. Similarly, scientists can
ask relevant communicators to cite their open data, and can also mention this
information themselves when they engage in science communication directly. In
addition, consumers of scientific communication and other relevant stakeholders—such
as the general public, politicians, regulators, and funding bodies—can ask,
whenever they hear about new research findings, whether the data was made
openly available, and if not, then why.
Overall, such actions will lead to increased and more effective use of open data over time, which will help increase the trust people have in scientists. Furthermore, this will help promote the adoption of open data practices in the scientific community, by making more scientists aware of the concept, and by increasing their incentives for engaging in it.