We are pleased to announce that, thanks to the support of the Vietsch Foundation, we will be developing an integration between DSpace repositories and OpenAlex. We are partnering with 4Science, a certified platinum DSpace provider, to deliver this project that will integrate two key systems within the global scholarly ecosystem, the DSpace repository (https://www.dspace.org/) and OpenAlex (https://openalex.org/), a free and open catalogue of the world’s scholarly research system.
Using OpenAlex’s open API (Application Programming Interface), this integration will allow for the quick import of relevant research and scholarly (meta)data into DSpace repositories, helping institutions to improve the quality and completeness of their records of research outputs and streamlining researcher publication and reporting workflows by providing accurate and relevant information in automated ways. This integration will also save time for researchers and librarians, who would be able to dedicate time to more research-oriented tasks.
Reusing the data in OpenAlex will help institutions to improve the quality and completeness of data in institutional repositories and strengthen the wider open access network by increasing the number of versions and access points to content. Moreover, the availability of multiple (open access) copies of the materials can provide a more effective strategy for long-term preservation.
The research and scholarly publishing environment is changing rapidly and there is an increasing expectation that research findings will be shared, both among funders and policy makers, and the wider research and public community. We strongly believe that institutional research repositories and scholarly platforms play a critical role in supporting these open research practices by preserving and disseminating research findings and supporting materials produced by institutions. The solution developed in this project will greatly contribute to increasing and enhancing the availability of open and accurate information about research outputs in the wider scholarly ecosystem.
Written by Clair Castle, Dr Kim Clugston, Dr Lutfi Bin Othman, Dr Agustina Martínez-García.
How the ‘second life’ of datasets is impacting the research world. Researchers share their stories.
“Research data is the evidence that underpins all research findings. It’s important across disciplines: arts, humanities, social sciences, and STEMM. Preserving and sharing datasets, through Apollo, advances knowledge across research, not only in Cambridge, but across the world – furthering Cambridge’s mission for society and our mission as a national research library.”
Dr Jessica Gardner, University Librarian & Director of Library Services
The research data produced and collected during research takes many different forms: numerical data, digital images, sound recordings, films, interview transcripts, survey data, artworks, texts, musical scores, maps, and fieldwork observations. Apollo collects them all.
Apollo is the University of Cambridge repository for research datasets. Managed by the Research Data team at Cambridge University Library, Apollo stores and preserves the University’s research outputs.
The Research Data team guides researchers through all aspects of research data management – how to plan, create, organise, curate and share research materials, whatever form they take – and assists researchers in meeting funders’ expectations and good research practice.
In this blog post, upon reaching our 5000 datasets milestone, we share researcher stories about the impact their datasets have had, and continue to have, across research – and explain how researchers at the University can benefit from depositing their datasets on Apollo.
“Sharing data propels research forward. It recognises the importance of the original datasets in their own right, and the researchers who worked on them. Many of the research funders, supporting work at the University of Cambridge, require that research data is made openly available with as few restrictions as possible. Our researchers are fully supported to do this with Apollo and the Research Data team. I’m really excited that Apollo has reached the 5000 dataset milestone.”
Professor Sir John Aston, Pro Vice-Chancellor for Research at the University of Cambridge
Why should researchers share their research outputs on a repository?
Making research data openly available is recognised as an important aspect of research integrity and in recent years has garnered support from funders, publishers and researchers. Open data supports the FAIR principles and many funders now include data sharing practices within their policies as part of the application process. Publishers and funders often require a data availability statement (DAS) to be included in publications. It is worth mentioning (including in a DAS) that there are situations where data cannot be shared, particularly if data contains personal or sensitive information or where there is no permission to share it. But a lot of data can be shared and this movement towards open data promotes greater trust, both among researchers and for engagement with the general public.
In the UK, funding bodies often mandate openly sharing the data supporting their research grants. A large proportion of funding for research is from taxpayers’ money or charity donations so making data available openly for reuse provides value for money. It also allows the data behind claims to be accessed for traceability, transparency and reproducibility. Open data increases efficiency, as it prevents work being repeated that may have already been done; for this reason, it is encouraged to publish negative results too. Publishing data gives researchers credit for the work they have done, giving them more visibility in their field, and increasing the discoverability of their research which could lead to potential collaborations and increased citations. Open data also means that researchers have access to valuable datasets that could educate, enhance and further their research when applied by practitioners worldwide.
The second life of data
Apollo supports data from all disciplines, and this is represented by the various formats that the repository holds in its collection – from movie files, images, audio recordings, or code, to the more common text and CSV files. The repository now also hosts methods. Researchers are encouraged to deposit these outputs onto the repository to facilitate the impact and re-use of data underlying their research, so that their research data can be cited as a form of scholarly output in their own right. In 2023, there were over 95,000 views of datasets and software and associated metadata items on Apollo, and over 37,000 files were downloaded (source: IRUS). This proves that datasets and software deposited on Apollo are easy to discover and are highly used.
The open availability of Brion’s data that can be used to train AI (a significant trajectory for research currently) is welcomed by researchers such as AI specialist Bill Marino, a PhD candidate and Data Champion from the Department of Computer Science and Technology: “It’s really important that AI researchers are able to reproduce each other’s findings. The opaque nature of some of today’s AI models means that access to data is a key ingredient of AI reproducibility. This effort really helps get us there.”
Brion considers that sharing his data “has significantly enhanced the impact and reach” of his research and that “it has increased the visibility and credibility of my work, as other scientists can validate and build upon my findings.” On the benefits of depositing data on a repository, he says that sharing “ensures that the data is preserved and accessible for the long term, which is crucial for reproducibility and transparency in research”. He adds, “Repositories often provide metadata and tools that make it easier for other researchers to find and use the data”, which “promotes a culture of openness and collaboration, which can accelerate scientific discovery and innovation.”
Research data supporting “Regime transitions and energetics of sustained stratified shear flows” is a dataset from another depositor, Adrien Lefauve, from the Department of Applied Mathematics and Theoretical Physics and consists of MATLAB codes and accompanying movies files. Lefauve is, in fact, a frequent dataset depositor with 10 datasets published in Apollo. He considers that data sharing gives his data “a second life” by allowing researchers to reuse his data in pursuit of new projects but admits that “there is also a selfish reason for doing it!”. He explains that “After several months or years without having worked on a dataset, I sometimes need to go back to it, either by myself or when I hand it over to a colleague or student to test new ideas. Having a well-structured, user-friendly and thoroughly documented dataset is invaluable and will save you a lot of time and frustration when you need to resurrect your own research.”
Lefauve’s dataset has been cited in other publications and he encourages other researchers to look at his datasets and reuse them: “When people see that datasets can be cited in their own right and attract citations, it can encourage them to make the extra effort to deposit their data”. Lefauve is an advocate for sharing data on a repository and in his view data sharing is: “not only important for research integrity and reproducibility, but it also ensures that research funds are used efficiently. My datasets are usually from laboratory experiments which can take a lot of time and resources to perform. Hence, I feel there is a duty to ensure the data can be used to the fullest by the community. It also helps build a researcher’s profile and credentials as a valuable contributor to the community, beyond simply publication output, which often only use a small fraction of a dataset.”
Lefauve describes his field (fluid mechanics) as one that has benefited from the explosion of open data that is made available to the research community, but he is also aware that for a dataset to be reused, it requires comprehensive documentation and curation. Lefauve hopes that sharing data in a repository “will become increasingly commonplace as the next generation is taught that this is an essential part of data-intensive research.”
How to deposit data on Apollo, and why choose Apollo
There are thousands of data repositories to submit data to, so how to choose the right one? Funders may specify a disciplinary or institutional repository (see re3data.org for a directory of all repositories). Members of the University of Cambridge can deposit their data on the institutional repository, Apollo. Apollo has CoreTrustSeal certification, which means it has met the 16 requirements to be a sustainable and trustworthy infrastructure. Research outputs can be deposited as several types, such as dataset, code or method.
Researchers may think that the files are the most important aspect when depositing a dataset, but we cannot emphasise enough the importance of providing good metadata (data about data) to go alongside the files. This is the area that we find researchers need some encouragement with, but we hope that the experiences of the researchers we have featured above highlight the importance that good metadata has for their data. No one knows their data better than the person who generated it, so they are in the best position to describe it. A good description of a dataset enables users with no prior knowledge about the dataset to be able to discover, understand and reuse the data correctly, avoiding misinterpretation, without having access to the paper it supports. Be aware that others may discover datasets in isolation from a paper that it supports: we recommend that researchers avoid referring to the paper or using the abstract of the paper to describe their dataset. An article abstract describes the contents of the article, not of the dataset. It can also be really useful for researchers to describe their methods and how their files are organised for example, by providing README files. These give the dataset context as to how the data was generated, collected and processed. Good metadata will also enhance a dataset’s discoverability.
Another benefit of sharing data on Apollo is that our datasets are indexed on Google’s Dataset Search, a search engine for datasets. It is best practice to cite any datasets used in research in the bibliography/reference list of the paper, thesis etc. In fact, there is new guidance for Data Reuse on the Apollo website which describes how to use Apollo to discover research data and how to cite it. We advise that researchers start doing this now (if they don’t already) so they get into a good habit: it will encourage others to do the same and make it a lot easier for others to reuse data and for researchers to receive recognition for it. Citation data for datasets are displayed on Apollo and alongside this it is possible to track the attention that a dataset receives via an Altmetric Attention Score.
Apollo repository key milestones
Since its inception in 2016, when it started minting DOIs (Digital Object Identifiers), Apollo has continued to hit milestones and develop into the robust, safe and resilient repository infrastructure that it is today.
Apollo has continued to support FAIR principles by incorporating new and critical functionality to further enhance discovery, access and long-term preservation of the University research outputs it holds. For example, integration with our CRIS (Current Research Information System), Symplectic Elements, to streamline the depositing process, and integration with JISC Publications Router to automatically deposit metadata-rich records in Apollo (2016, 2019, 2021).
The latest milestone will be for research outputs published within Octopus, a novel approach to publication, to be preserved together with associated publications and underpinning research datasets in Apollo to facilitate sharing and re-use (2024-25). In future we want to develop our ability to collect and interpret data citation statistics for Apollo so we can better assess the impact of the research data generated at the University.
We can be contacted by email at info@data.cam.ac.uk. Researchers can also request a consultation with us to discuss any aspect of their research data management (RDM) needs, including data management plans, data storage and backup, data organisation, data deposition and sharing, funder data policies, or to request bespoke training.
Remember that there is also an amazing network of Data Champions that can be called upon for advice, particularly from a disciplinary perspective.
In this blog post, Dr Caroline Edwards, Executive Director, Open Library of Humanities and Senior Lecturer in Contemporary Literature & Culture, Birkbeck, University of London asks: How do we ensure that a flipped diamond open access journal can remain independent? How do we prepare for the long-term financial security of flipped journals and protect against their potential vulnerability to commercial acquisition in the decades to come?
Flipping academic journals to diamond open access (OA) presents a series of challenges to an academic publisher. You need certain niche competencies. Firstly, nothing happens without the complete trust of an editorial team that shares your appetite for risk. Then, you need the backing of an entire academic community, willing to follow the editorial team to a new journal (in cases where editors don’t own the journal IP, which is most cases) and undertake a boycott of the old “zombie” title. Underpinning all of this, you need the financial and technological resources to provide the necessary infrastructure for the flipped journal in perpetuity, to offer it a safe home with a long-term future that doesn’t require any author fees. This involves things like setting up and maintaining a new journal site, running a digital publishing platform for managing submission, review, and production processes, having the capacity to manage metadata integration with university library catalogues and discoverability databases, providing memberships at robust digital preservation organisations, and ongoing research and development to stay abreast of rapid changes within the digital publishing landscape. The list goes on.
The growing list of journals flipping to diamond OA from their commercial publishers is well known. Retraction Watch keeps an up-to-date list of editorial boards that have resigned from for-profit models and moved their titles to not-for-profit, community-governed models. Each has its own story, told across published statements, academic blogs, and in newspaper articles covering high-profile editorial resignations and academic boycotts. But what gets talked about less frequently is the community governance structure that will support the journal moving forwards. How do we ensure that a flipped journal can remain independent? How do we prepare for its long-term financial security and protect against future vulnerability to commercial acquisition in the decades to come?
At the Open Library of Humanities (OLH), I spend much of my time talking to editors about their journals. There is a depressingly common story. It usually goes something like this. Many academic journals were launched between the 1960s and 1980s, in a collaboration between university professors and small or independent publishing houses. Things worked pretty well until their small publisher was bought out in the 1990s or 2000s by a larger company, often overseen by a global parent company. They muddled along with a high turnover of staff on the publisher side. Over time, the publishing managers became harder to get hold of, production was outsourced overseas, and editors became increasingly aware of a decline in production quality.
With the acceleration to open access in the 2010s, editors came under pressure to double or triple their article acceptance rates – with a drop in subscriptions revenue, commercial publishers had to recoup costs via article processing charges (APCs). The more volume they could pump out, the better their profit margins. Even when journal editors rejected unsuitable or poor-quality articles, publishers found a way to fast-track this academic content by surreptitiously channelling it through their digital platforms to their hundreds of other journals using the same platform. Not all editors were even aware that the transfer of rejected articles had taken place.
If the Editor(s)-in-Chief had the temerity to stand up for their academic principles and refuse to increase their journal’s article acceptance rates, at this point they could face legal challenges or dismissal. In several explosive cases in recent years, Editors-in-Chief have been fired by their commercial publishers after refusing to back down over these issues. Sacking an internationally renowned editor whose reputation has become synonymous with the journal’s own reputation isn’t for the faint-hearted. It says something about the desperation of commercial publishers and their shrinking profits that they would be willing to trash a journal’s reputation so comprehensively – among the very academic communities whose uncompensated labour produced that reputation in the first place.
At this point in my conversations with editors, I ask a difficult question: Who owns the journal? “The publisher” they say, or “we don’t know.” Sometimes they reply: “The founding editor has passed away; we’ve asked their children, but no one can find any paperwork.” Without the rights to the journal title, its name, and logo, editors must set up a new journal. Ensuring the continuity between the old (now trashed) journal title and the new journal title requires coordinating a mass resignation of editors and authors from the old journal, preferably along with a boycott by peer reviewers for the foreseeable future.
At the OLH we’ve spent almost a decade flipping academic journals to diamond OA, supported by a growing number of libraries worldwide who share our vision for a not-for-profit academic publishing future. It wasn’t called “diamond” when we launched in 2015, but the term has come to mean not-for-profit and community-governed OA. Our publishing model is inspired by an explicitly political project – if the OLH and similar university-owned journal publishers are to thrive, they need to divert university library funding away from the big 5 commercial publishers (Elsevier, Wiley, Taylor & Francis, and Springer). This happens hand-in-hand with library advocacy. When expensive journals flip to diamond OA, librarians are empowered to cancel individual journal subscriptions. In the age of big bundles and journal packages, journal flipping allows them to renegotiate extortionate deals with commercial publishers in light of the shrinking number of titles in each package.
Since launching as a publisher in 2015, the OLH has flipped 20 journals in this way. It hasn’t always been easy, and we have learned a lot along the way. In cases where journals own their own intellectual property (IP), usually via a scholarly association or legal governing body, the process of migrating decades of back content requires highly complex, skilled technical work. In cases where journals don’t own their IP, editors are unable to take the journal title with them. In these cases, a new journal needs to be established to continue the mission of the original title. This leaves behind zombie journals; the undead husks of formerly respected titles, that commercial publishers refuse to close but cannot run when the entire scholarly community has agreed to boycott it. The case of Wiley’s Journal of Political Philosophy, which relaunched with the OLH as Political Philosophyin February 2024, is a case in point.
Some of the journals that the OLH has flipped to diamond OA have set up a nonprofit organisation to protect themselves. Zygon: Journal of Religion and Science, a former Wiley journal that dates back to 1966, was able to do this because the original editors had the foresight to protect their IP before their publisher Blackwell was taken over by Wiley in 2007 (the journal had previously been published by two different university presses, the University of Chicago Press (1966–1978) and Wilfrid Laurier University Press (1979–1989)). The Zygon editorial team set up its own not-for-profit scholarly corporation in Chicago in 2019, following a joint venture established in 1965 among founding partners. As a 501(c)(3) organization, the Zygon: Journal of Religion and Science NFP not-for-profit scholarly corporation is a charitable organisation exempt from federal income tax. This route is being taken by other OLH journals including Theory & Social Inquiry(formerly Theory & Society), Political Philosophy (formerly the Journal of Political Philosophy), and Free & Equal: A Journal of Ethics and Public Affairs (formerly Philosophy & Public Affairs).
Several of the OLH’s journals have been owned by scholarly associations since their inception, including Quaker Studies (founded by the Quaker Studies Research Association (QSRA)), Architectural Histories (founded by the European Architectural History Network (EAHN)), Digital Studies / Le champ numérique (founded by the Canadian Society for Digital Humanities/Société canadienne des humanités numériques (CSDH/SCHN), Marvell Studies (founded by the Andrew Marvell Society), Open Screens (founded by the British Association of Film, Television and Screen Studies (BAFTSS)), and The Parish Review (founded by the International Flann O’Brien Society).
In other cases, independent journals joining the OLH have made the decision to affiliate themselves with scholarly societies. This has been the case for [in]Transition: Journal of Videographic Film & Moving Image Studies, which has become the official video essay journal of the Society for Cinema and Media Studies (SCMS), and C21 Literature: Journal of 21st-century Writings, which became the official journal of the British Association for Contemporary Literary Studies (BACLS) when the new association was founded in 2017. This kind of affiliation secures the community governance of journals. Scholarly associations have articles of association that usually include the criteria for appointing journal editors, terms of office, and processes for collectively undertaking decisions about the journal’s functioning and health.
Another route to long-term protection against commercial acquisition is for journals to join forces. This was the approach taken by 3 of the OLH’s journals who resigned en masse from Elsevier in 2015 – Lingua (which relaunched as Glossa), LabPhon, and the Journal of Portuguese Linguistics. Editors of these titles set up a community organisation, LingOA: Linguistics in Open Access as a Dutch Stichting (literally a “foundation”), a not-for-profit legal entity with limited liability similar to a trust, which is controlled by a board of directors and cannot have any shareholders.
With support from the Center of Science and Technology Studies at the University of Leiden, Radboud University Library, the Netherlands Organisation for Scientific Research (NWO), the Association of Dutch Universities (VSNU), and the Royal Netherlands Academy of Arts and Sciences (KNAW), LingOA was able to provide financial support for the journals beyond their funding agreement with the OLH. One of the OLH’s newest journals, Syntactic Theory and Research (STAR, which left Wiley) has also joined the LingOA Stichting.
Other journals that have joined the OLH in 2023-2024 will need to establish their own legal and ownership entities, and we continue to offer help and advice to editorial teams undertaking this important work. Our goal at the OLH is to liberate university research from commercial control. Flipping journals to diamond OA is the first step; enshrining community governance is the crucial next step. As more funding bodies mandate diamond OA and not-for-profit academic publishing infrastructure (such as this recent announcement by the NWO), the tide is turning against commercial actors. Now is the time for editors and scholarly communities to regain control of their scholarship.
This post does not necessarily reflect the view of Cambridge University Libraries.