Cambridge Data Week 2020 was an event run by the Office of Scholarly Communication at Cambridge University Libraries from 23–27 November 2020. In a series of talks, panel discussions and interactive Q&A sessions, researchers, funders, publishers and other stakeholders explored and debated different approaches to research data management. This blog is part of a series summarising each event:
How should researchers’ data management activities and skills be supported? What are the data management responsibilities of the funder, the institution, the research group and the individual researcher? Should we focus on training researchers so they can carry out good data management themselves or should we be funding specialist teams who can work with research groups, allowing the researchers to concentrate on research instead of data management? These were the questions addressed on day 4 of Cambridge Data Week 2020. This session benefitted from the perspectives of three speakers deriving from three different components of the research ecosystem: national funder, institutional research support and department/institute. Respectively, these were provided by Tao-Tao Chang (Arts and Humanities Research Council [AHRC]), Marta Teperek (TU Delft) and Alastair Downie (The Gurdon Institute, Cambridge).
From a funder’s perspective, and following UKRI community consultation, Tao-Tao specifies that digital research infrastructure is recognised as an area for urgent investment, particularly in the arts and humanities, where both software and data loss are acute. Going forwards, AHRC’s key priorities will be to prevent further data loss, invest in skills, build capability, and work with the community to effect a sustained change in research culture. At an institutional level, Marta argues that it is unfair for researchers to be left unsupported to manage their data. The TU Delft model addresses this via three methods: central data support, disciplinary support by data stewards as permanent faculty staff, and hands-on support for research groups via data managers and research software engineers. Regarding the latter, an important take-home message for all researchers, regardless of institutional affiliation, is to build data management costs into grant proposals. Alastair takes up the discussion at the level of the department, research group and even individual, highlighting how researchers are locked into infrastructure silos, and locked into an unhelpful, competitive culture where altruism is a risky proposition and the career benefits of sharing seem intangible or insufficient. Alastair proposes that the climate is right and the community is ready for change, and goes on to discuss some positive changes afoot in the School of Biological Sciences to counteract these.
We had 291 registrations for the webinar, with just over 70% originating from the Higher Education sector. Researchers and PhD students accounted for 30% of the registrations whilst research support staff from various organisations accounted for an impressive 46%. On the day, we were thrilled to see that 136 people attended the webinar, participating from a wide range of countries.
Recording, transcript and presentations
There were a few questions we did not have time to address during the live session, so we put them to the speakers afterwards. Here are their answers:
Talking about the technical side have you yet come across anyone using a machine implementable DMP? Setting up a data management infrastructure for a large project it’s become apparent that checking compliance with a DMP is a huge job and of course there is minimal resource for doing this.
Marta Teperek Work is being done in this area by Research Data Alliance where there are several groups working on machine actionable DMPs. Basically, the idea is that instead of asking researchers to write long essays about how they are planning to manage their data, they are asked to provide answers that are structured. These can be multiple choice options, for example, where the researcher specifies that they will be depositing large amounts of data in the repository and the repository will be notified of data coming their way. In other words, actions are made depending on what the researcher says they will do. University of Queensland is doing a lot on this already [see link to blog post here and in Resources further below].
What are the best cross-platform, mobile and desktop tools for data management?
Alastair Downie RDM encompasses a far too broad a range of activities – it’s a concept rather than a single activity that you can build into a neat little app. In the context of electronic lab notebooks, for example, there are hundreds of apps that serve that function and some of them cross over into lab management as well. Those products that try to do too much become very bloated and complex, which makes them unattractive and so we don’t see uptake of those kind of products. I think a suite approach is better than a single solution.
Institutions audit spending on research grants, they should do the same for research data and should be a requirement of holding a grant.
Alastair Downie Wellcome Trust are now challenging researchers to demonstrate that they have complied with their DMPs. It’s not particularly empirical but the fact that they are demonstrating their determination to make sure that everyone’s doing things properly is very helpful.
Are there any specific infrastructure projects that the AHRC is sponsoring? I’m curious about what infrastructure/services would be useful for Arts and Humanities researchers
Tao-Tao Chang Not at this juncture. But we are hoping that this will change. AHRC recognises the importance of good data management practice and the need to support it. We also recognise that there is a skills gap and that all researchers at every level need support.
Is there a 2020 edition of the State of Open Data report?
There are two outcomes of the webinar to draw upon here. The first raises again the question: do researchers need, or even want, a fairy godmother to support their research data management? We held a poll at the end of the webinar, asking participants to choose which one of the following statements they believe most strongly: (1) ‘Individual researchers should learn how to manage their own data well’ or (2) ‘Researchers’ data should be managed by funded RDM specialists so that researchers can focus on research’. Of the 78 respondents, 67% chose the first option and 33% chose the second. There was not an intermediate option to incorporate both, simply because we wanted to force a choice in the direction of strongest belief when the two options are considered relative to one another.
The results of the poll and the discussions during the webinar (between the speakers and within the chat) indicate that while individual researchers are responsible for managing their research data, support does need to be made available and promoted actively (we provide in the ‘Resources’ section some links to University of Cambridge research data management support). A second outcome reveals that support needs to be provided under several different guises. On the one hand, there is support that comes via the provision of funding, research data services and individually tailored expertise. Yet, on the other hand, there is support that will derive, albeit in a less tangible sense, from positive changes in research culture, specifically in terms of how the research of individual researchers is assessed and rewarded.
Some links to University of Cambridge research data management support include: the Research Data Management Policy Framework that outlines, for example, the data management responsibilities of research students and staff; our data management guide; a list of Cambridge Data Champions, searchable by areas of expertise.
A recent Postdoc Academy podcast on ‘How can we improve the research culture at Cambridge?’
A description of different data management support roles at TU Delft, by Alastair Dunning and Marta Teperek: data steward, data manager, research software engineer, data scientist and data champion.
A Gurdon Computing blog post by Alastair Downie on ‘Research data management as a national service’; in other words, rather than duplicating infrastructure and services across the research landscape.
An article by Florian Markowetz, discussed in the webinar, on ‘Five selfish reasons to work reproducibly’ (in Genome Biology).
TU Delft Open Working blog post by Marta Teperek on machine actionable Data Management Plans (DMPs) at the University of Queensland. For more information, see this article by Miksa and colleagues on the ‘Ten principles for machine-actionable data management plans’ (in PLOS Computational Biology).
The State of Open Data 2020 report, published on 1 December 2020.
Published on 25 January 2021
Written by Dr Sacha Jones with contributions from Tao-Tao Chang, Dr Marta Teperek, Alastair Downie and Maria Angelaki.