Category Archives: Uncategorized

Open Access policy, procedure & process at Cambridge

First up, HEFCE’s Open Access policy:

At the outset, let’s be clear: the HEFCE Open Access policy applies to all researchers working at all UK HEIs. If an HEI wants to submit a journal article for consideration in REF 2021 the article must appear in an Open Access repository (although there is a long list of exceptions). Keen observers will note that in the above flowchart HEFCE’s policy is enforced based on deposit within three months of acceptance. This requirement has caused significant consternation amongst researchers and administrators alike; however, during the first two years of the policy (i.e. until 31 March 2018) publications deposited within three months of publication will still be eligible for the REF. At Cambridge, we have been recording manuscript deposits that meet this criterion as exceptions to the policy[1].

Next up, the RCUK Open Access policy. This policy is straightforward to implement, the only complication being payment of APCs, which is contingent on sufficient block grant funding. Otherwise, the choice for authors is usually quite obvious: does the journal have a compliant embargo? No? Then pay for immediate open access.

One extra feature of the RCUK Open Access policy not captured here is the Europe PMC deposit requirement for MRC and BBSRC funded papers. Helpfully, the policy document makes no mention of this requirement; rather, this feature of the policy appears in the accompanying FAQs. I’m not expert, but this seems like the wrong way to write policies.

Finally, we have the COAF policy, possibly the single most complicated OA policy to enforce anywhere in the world. The most challenging part of the COAF policy is the Europe PMC deposit requirement. It is often difficult to know whether a journal will indeed deposit the paper in Europe PMC, and if, for whatever reason, the publisher doesn’t immediately deposit the paper, it can take months of back-and-forth with editors, journal managers and publishing assistants to complete the deposit. This is an extremely burdensome process, though the blame should be laid squarely at the publishers. How hard is it to update a PMC record? Does it really take two months to update the Creative Commons licence?

This leads us to one of the more unusual parts of the COAF policy: publications are considered journals if they are indexed in Medline. That means we will occasionally receive book chapters that need to meet the journal OA policy. Most publishers are unwilling to make such publications OA in line with COAF’s journal requirements so they are usually non-compliant.

What happens if you should be foolish enough to try to combine these policies into one process? Well, as you might expect, you get something very complicated:

This flowchart, despite its length, still doesn’t capture every possible policy outcome and is missing several nuances related to the payment of APCs, but nonetheless, it gives an idea of the enormous complexity that underlies the decision making process behind every article deposited in Apollo and in other repositories across the UK.

[1] Within the University’s CRIS, Symplectic Elements, only one date range is possible so we have chosen to monitor compliance from the acceptance date. Publications deposited within the ‘transitional’ three months from publication window receive an ‘Other’ exception within Elements that contains a short note to this effect.

Published 18 September 2017
Written by Dr Arthur Smith
Creative Commons License

Researchers championing data – what works?

Here we follow up on our earlier piece “Creating a research data community”, where Rosie Higman and Hardy Schwamm discussed innovative ways of researcher engagement with research data management.

This blog discusses the outcome from a dedicated Birds of a Feather session at the 9th Research Data Alliance Plenary meeting in Barcelona in April 2017. The session discussed three different programmes for engaging researchers with data management and sharing: University of Cambridge Data Champions programme, TU Delft’s Data Stewardship and SPARC Europe’s Open Data Champions. The purpose of this session was to exchange practice, discuss the difference between the programmes and talk about possible next steps. All presentations from the sessions are available.

Cambridge’s Data Champions

Cambridge’s Data Champions programme was started in Autumn 2016 and is a programme in which researchers volunteered to become a local community expert and advocate on research data management and sharing. The main expectation of those appointed as Data Champions was to run at least one workshop on a topic related to research data management for their research community and to act as the local expert connecting researchers and central data services. In return Champions were offered new networking opportunities, training in research data management and sharing and also a boost to their CVs. Detailed information about the expectations, benefits of becoming a Champion, as well as the support from central services are publicly available.

The Data Champions programme is coordinated during bi-monthly meetings during which Champions exchange practice, talk to each other about their interactions with other researchers and provide each other with advice on tackling some of the data-related challenges. Over time Champions formed a community of practice and the central Research Data Team started to act more as hands-off facilitators of these activities and discussions rather than prescribing Champions what to do and how to best engage with researchers locally. The rationale behind this was that Data Champions would know their own research communities best and would be best positioned to decide what types of training and engagement methods would work for them.

And in fact the Champions delivered quite unexpected and diverse range of outputs. The initial requirement was to deliver a training on research data management to their local communities. The Research Data Management workshop template was shared with the Champions and they were all trained about the content and the methods of the workshop delivery. However, Champions were given discretion on what training they provided and how they wish to deliver. And in fact they developed all sorts of materials and strategies for engaging their local communities: from highly successful regular research data ‘tips’ emails sent to everyone in a department, through data sharing FAQs for chemists and ORCiD drop-in sessions, to organising Electronic Lab Notebooks trials. While certainly interesting and valuable, this also raised a questions as to whether the messages about data management and sharing are still consistent and aligned with the central data services, and also if the high quality of training is maintained.

TU Delft’s Data Stewardship programme

Madeleine de Smaele from TU Delft spoke about their Data Stewardship programme. The goal of the programme is to create mature working practices and policies for research data management across each of the eight faculties at TU Delft, so that any project can make sure their data is managed well. The programme is part of the broader Open Science agenda at TU Delft, which aims to make research more accessible and more re-usable. In contrast to the hands-off and decentralised Data Champions programme at Cambridge, TU Delft’s Data Stewardship programme has a solid framework as its core: a team of eight Data Stewards (a dedicated Data Steward for each one of eight TU Delft’s faculties), led centrally by the Data Stewardship Coordinator.

Data Stewards are disciplinary experts, who are embedded within faculties, and are able to understand and address the specific data management needs of their research communities. However, thanks to working as a team, which is centrally coordinated, the work of Data Stewards is coherent and aligned. This is reflected for example in research data policy development. TU Delft will have a central policy framework for research data management; however, it is Data Stewards working with their faculties who will develop research data policies, tailored to specific needs of individual faculties.

SPARC Europe’s Open Data Champions

SPARC Europe’s Open Data Champions initiative took yet a different approach from Cambridge and TU Delft and it aims to help promote the use of ambassadors or champions in the scientific community to help unlock more scientific data. The focus of the Open Data Champions Initiative is to achieve cultural change needed to see more research data shared and re-used.

Similarly to their previous SPARC Europe’s Open Access Champions initiative, the rationale behind the Open Data Champions is that activists who stimulate cultural change need to be promoted and supported to effect greater, speedier, more motivated research-driven change to help make Open the default in Europe. SPARC Europe wants to identify Champions at different career levels (from PhD students to vice chancellors), from a range of disciplines and from a variety of European countries to inspire broad range of stakeholders.

Are the programmes really effective?

After short presentations about the three programmes, the attendees started discussing different aspects of all programmes: their different aims, audiences, reward systems and sustainability of these activities. Perhaps the most interesting discussion was around measuring the effectiveness of these initiatives. All three programmes aim to ultimately achieve cultural change towards better data management and greater openness. Are the programmes all equally effective at achieving cultural change? Or are perhaps different modes of engagement bringing different results? How to measure cultural change?

And, finally, what are the costs and benefits of each programme? TU Delft’s Data Stewardship programme, with discipline-specific Data Stewards, is more resource-intensive than Cambridge’s Data Champions relying on researchers volunteering their time; both programmes are however more costly than SPARC Europe’s Open Data Champions.

Need for international collaboration and practice exchange

Our discussions brought more questions than answers but we all agreed that the exchange of ideas and practice was productive and useful. Many attendees expressed their interest for starting dedicated researcher engagement programmes at their institutions. Therefore, one of the main conclusions of the session was that it would be valuable to create a forum where those running programmes for researcher engagement could regularly discuss their programmes, exchange ideas and problem-solve jointly. This is particularly important for difficult questions, which the community struggles to address, such as metrics for assessing cultural change in data management and sharing. Working collaboratively can prove incredibly efficient, which was recently demonstrated by a teamwork effort which led to the development of metrics for assessment of data management training programmes.

Next steps

As a next step to extend our conversations and start identifying solutions to common problems, the University of Cambridge, SPARC Europe and Jisc are co-organising a dedicated event “Engaging Researchers in Good Data Management” on 15 November 2017 in Cambridge, United Kingdom. The event intends to bring together those working to support and engage researchers with open research and Research Data Management (RDM), including librarians, scholarly communication specialists and researchers from both the sciences and humanities. So if you are reading this blog post and would like to be part of these discussions, do come and join!

Published 15 September 2017
Written by Dr Marta Teperek
Creative Commons License

Who is requesting what through Cambridge’s Request a Copy service?

In October last year we reported on the first four months of our Request a Copy service. Now, 15 months in, we have had over 3000 requests and this provides us with a rich source of information to mine about the users of our repository.  The dataset underpinning the findings described here is available in the repository.

What are people requesting?

We have had 3240 requests through the system since its inception in June 2016. Of those the vast majority have been for articles 1878 (58%) and theses 1276 (39%). The remaining requests are for book chapters, conference objects, datasets, images and manuscripts. It should be noted that most datasets are available open access which means there is little need for them to be requested.

Of the 23 requests for book chapters, it is perhaps not surprising that the greatest number  – 9 (39%) came for chapters held in the collections from the School of Humanities and Social Sciences. It is however possibly interesting that the second highest number – 7 (30%) came for chapters held in the School of Technology.

The School of Technology is home to the Department of Engineering which is the University’s largest department. To that end it is perhaps not surprising that the greatest number of articles requested were from Engineering with 311 of the 1878 requests (17%) from here. The areas with next most requested number of articles were, in order, the Department for Public Health and Primary Care, the Department of Psychiatry, the Faculty of Law and the Judge Business School.

What’s hot?

Over this period we have seen a proportional increase in the number of requests for theses compared to articles. When the service started the requests for articles were 71% versus 29% for theses. However more recently, theses have overtaken request for articles to a ratio of 54% to 46%.

The most requested thesis, by a considerable amount, over this period was for Professor Stephen Hawking’s thesis with double the number of requests of the following ten most requested theses. The remaining top 10 requested theses are heavily engineering focused, with a nod to history and social research. These theses were:

The top 10 requested articles have a distinctly health and behavioural focus, with the exception of one legal paper authored by Cambridge University’s Pro Vice Chancellor for Education, Professor Graham Virgo.

When are people requesting?

Looking at the day of the week people are requesting items, there is a distinct preference for early in the week. This reflects the observations we have made about the use of our helpdesk and deposits to our service – both of which are heaviest on Tuesdays.

When in the publication cycle are the requests happening?

In our October 2016 blog we noted that of the articles requested in the four months from when the service started in June 2016 to the end of September 2016, 45% were yet to be published, and 55% were published but not yet available to those without a subscription to the journal.  The method we used for working this out involved identifying those articles which had been requested and determining if the publication date was after the request.

Now, 15 months after the service began it is slightly more difficult to establish this number. We can identify items that were deposited on acceptance because we place these items on a very long embargo (until 2100) until we can establish the publication date and set the embargo period. So in theory we could compare the number of articles with this embargo period against those that have a different date.

However articles that would provide a false positive (that appear to have been requested before publication) would be ones which had been published but we had not yet identified this – to give an indication of how big an issue this is for us, as of the end of last week there were 1768 articles in our ‘to be checked’ pile. We would also have articles that would provide a false negative (that appear to have been requested after publication) because they had been published between the request and the time of the report and the embargo had been changed as a result. That said, after some analysis of the requests for articles and conference proceedings, 19% are before publication. This is a slightly fuzzy number but does give an indication. 

How many requests are fulfilled?

The vast majority of the decisions recorded (35% of the total requests for articles, but 92% of the instances where we had a decision) indicate that the requestor shared their article with the requestor. The small number (3%) of  ‘no’ recordings we have indicate the request was actively rejected.

We do not have a decision recorded from the author in 62% of the requests. We suspect that in the majority of these the request simply expires from the author not doing anything. In some cases the author may have been in direct correspondence with the requestor. We note that the email that is sent to authors does look like spam. In our review of this service we need to address this issue.

Next steps

As we explained in October, the process for managing the requests is still manual. As the volume of requests is increasing the time taken is becoming problematic. We estimate it is the equivalent of 1 person day per week. We are scoping the technical requirements for automating these processes. A new requirement at Cambridge for the deposit of digital theses means there will be three different processes because requests for these theses will be sent to the author for their decision. These authors will, in most cases, no longer be affiliated with Cambridge. Requests for digitised theses where we do not have the author’s permission are processed within the Library and requests for articles are sent to the Cambridge authors.

Given the challenges with identifying when in the publication process the request has been made, we need to look at automating the system in a manner that allows us to clearly extract this information. The percentage of requests that occur before publication is a telling number because it indicates the value or otherwise of having a policy of collecting articles at the acceptance point rather than at publication.

Published 12 September 2017
Written by Dr Danny Kingsley
Creative Commons License

Sustaining long-term access to open research resources – a university library perspective

In the third in a series of three blog posts, Dave Gerrard, a Technical Specialist Fellow from the Polonsky-Foundation-funded Digital Preservation at Oxford and Cambridge project, describes how he thinks university libraries might contribute to ensuring access to Open Research for the longer-term.  The series began with Open Resources, who should pay, and continued with Sustaining open research resources – a funder perspective.

Blog post in a nutshell

This blog post works from the position that the user-bases for Open Research repositories in specific scientific domains are often very different to those of institutional repositories managed by university libraries.

It discusses how in the digital era we could deal with the differences between those user-bases more effectively. The upshot might be an approach to the management of Open Research that requires both types of repository to work alongside each other, with differing responsibilities, at least while the Open Research in question is still active.

And, while this proposed method of working together wouldn’t clarify ‘who is going to pay’ entirely, it at least clarifies who might be responsible for finding funding for each aspect of the task of maintaining access in the long-term.

Designating a repository’s user community for the long-term

Let’s start with some definitions. One of the core models in Digital Preservation, the International Standard Open Archival Information System Reference Model (or OAIS) defines ‘the long term’ as: 

“A period of time long enough for there to be concern about the impacts of changing technologies, including support for new media and data formats, and of a changing Designated Community, on the information being held in an OAIS. This period extends into the indefinite future.”

This leads us to two further important concepts defined by the OAIS:

Designated Communities” are an identified group of potential Consumers who should be able to understand a particular set of information”, i.e. the set of information collected by the ‘archival information system’. 

A “Representation Information Network” is the tool that allows the communities to explore the metadata which describes the core information collected. This metadata will consist of:

  • descriptions of the data contained in the repository
  • metadata about the software used to work with that data,
  • the formats in which the data are stored and related to each other, and so forth.  

In the example of the Virtual Fly Brain Platform repository discussed in the first post in this series, the Designated Community appears to be: “… neurobiologists [who want] to explore the detailed neuroanatomy, neuron connectivity and gene expression of Drosophila melanogaster.” And one of the key pieces of Representation Information, namely “how everything in the repository relates to everything else”, is based upon a complex ontology of fly anatomy.

It is easy to conclude, therefore, that you really do need to be a neurobiologist to use the repository: it is fundamentally, deeply and unashamedly confusing to anyone else that might try to use it.

Tending towards a general audience

The concept of Designated Communities is one that, in my opinion, the OAIS Reference Model never adequately gets to grips with. For instance, the OAIS Model suggests including explanatory information in specialist repositories to make the content understandable to the general community.

Long term access within this definition thus implies designing repositories for Designated Communities consisting of what my co-Polonsky-Fellow Lee Pretlove describes as: “all of humanity, plus robots”. The deluge of additional information that would need to be added to support this totally general resource would render it unusable; to aim at everybody is effectively aiming at nobody. And, crucially, “nobody” is precisely who is most likely to fund a “specialist repository for everyone”, too.

History provides a solution

One way out of this impasse is to think about currently existing repositories of scientific information from more than 100 years ago. We maintain a fine example at Cambridge: The Darwin Correspondence Project, though it can’t be compared directly to Virtual Fly Brain. The former doesn’t contain specialist scientific information like that held by the latter – it holds letters, notebooks, diary entries etc – ‘personal papers’ in other words. These types of materials are what university archives tend to collect.

Repositories like Darwin Correspondence don’t have “all of humanity, plus robots” Designated Communities, either. They’re aimed at historians of science, and those researching the time period when the science was conducted. Such communities tend more towards the general than ‘neurobiologists’, but are still specialised enough to enable production and management of workable, usable, logical archives.

We don’t have to wait for the professor to die any more

So we have two quite different types of repository. There’s the ‘ultra-specialised’ Open Research repository for the Designated Community of researchers in the related domain, and then there’s the more general institutional ‘special collection’ repository containing materials that provide context to the science, such as correspondence between scientists, notebooks (which are becoming fully electronic), and rough ‘back of the envelope’ ideas. Sitting somewhere between the two are publications – the specialist repository might host early drafts and work in progress, while the institutional repository contains finished, publish work. And the institutional repository might also collect enough data to support these publications, too, like our own Apollo Repository does.

The way digital disrupts this relationship is quite simple: a scientist needs access to her ‘personal papers’ while she’s still working, so, in the old days (i.e. more than 25 years ago) the archive couldn’t take these while she was still active, and would often have to wait for the professor to retire, or even die, before such items could be donated. However, now everything is digital, the prof can both keep her “papers” locally and deposit them at the same time. The library special collection doesn’t need to wait for the professor to die to get their hands on the context of her work. Or indeed, wait for her to become a professor.

Key issues this disruption raises

If we accept that specialist Open Research repositories are where researchers carry out their work, that the institutional repository role is to collect contextual material to help us understand that work further down the line, then what questions does this raise about how those managing these repositories might work together?

How will the relationship between archivists and researchers change?

The move to digital methods of working will change the relationships between scientists and archivists.  Institutional repository staff will become increasingly obliged to forge relationships with scientists earlier in their careers. Of course, the archivists will need to work out which current research activity is likely to resonate most in future. Collection policies might have to be more closely in step with funding trends, for instance? Perhaps the university archivist of the digital future might spend a little more time hanging round the research office?

How will scientists’ behaviour have to change?

A further outcome of being able to donate digitally is that scientists become more responsible for managing their personal digital materials well, so that it’s easier to donate them as they go along. This has been well highlighted by another of the Polonsky Fellows, Sarah Mason at the Bodleian Libraries, who has delivered personal digital archiving training to staff at Oxford, in part based on advice from the Digital Preservation Coalition. The good news here is that such behaviour actually helps people keep their ongoing work neat and tidy, too.

How can we tell when the switch between Designated Communities occurs?

Is it the case that there is a ‘switch-over’ between the two types of Designated Community described above? Does the ‘research lifecycle’ actually include a phase where the active science in a particular domain starts to die down, but the historical interest in that domain starts to increase? I expect that this might be the case, even though it’s not in any of the lifecycle models I’ve seen, which mostly seem to model research as either continuing on a level perpetually, or stopping instantly. But such a phase is likely to vary greatly even between quite closely-related scientific domains. Variables such as the methods and technologies used to conduct the science, what impact the particular scientific domain has upon the public, to what degree theories within the domain conflict, indeed a plethora of factors, are likely to influence the answer.

How might two archives working side-by-side help manage digital obsolescence?

Not having access to the kit needed to work with scientific data in future is one of the biggest threats to genuine ‘long-term’ access to Open Research, but one that I think it really does fall to the university to mitigate. Active scientists using a dedicated, domain specific repository are by default going to be able to deal with the material in that repository: if one team deposits some material that others don’t have the technology to use, then they will as a matter of course sort that out amongst themselves at the time, and they shouldn’t have to concern themselves with what people will do 100 years later.

However, university repositories do have more of a responsibility to history, and a daunting responsibility it is. There is some good news here, though… For a start, universities have a good deal of purchasing power they can bring to bear upon equipment vendors, in order to insist, for example, that they produce hardware and software that creates data in formats that can be preserved easily, and to grant software licenses in perpetuity for preservation purposes.

What’s more fundamental, though, is that the very contextual materials I’ve argued that university special collections should be collecting from scientists ‘as they go along’ are the precise materials science historians of the future will use to work out how to use such “ancient” technology.

Who pays?

The final, but perhaps most pressing question, is ‘who pays for all this’? Well – I believe that managing long-term access to Open Research in two active repositories working together, with two distinct Designated Communities, at least might makes things a little clearer. Funding specialist Open Research repositories should be the responsibility of funders in that domain, but they shouldn’t have to worry about long-term access to those resources. As long as the science is active enough that it’s getting funded, then a proportion of that funding should go to the repositories that science needs to support it. The exact proportion should depend upon the value the repository brings – might be calculated using factors such as how much the repository is used, how much time using it saves, what researchers’ time is worth, how many Research Excellence Framework brownie points (or similar) come about as a result of collaborations enabled by that repository, etc etc.

On the other hand, I believe that university / institutional repositories need to find quite separate funding for their archivists to start building relationships with those same scientists, and working with them to both collect the context surrounding their science as they go along, and prepare for the time when the specialist repository needs to be mothballed. With such contextual materials in place, there don’t seem to be too many insurmountable technical reasons why, when it’s acknowledged that the “switch from one Designated Community to another” has reached the requisite tipping point, the university / institutional repository couldn’t archive the whole of the specialist research repository, describe it sensibly using the contextual material they have collected from the relevant scientists as they’ve gone along, and then store it cheaply on a low-energy medium (i.e. tape, currently). It would then be “available” to those science historians that really wanted to have a go at understanding it in future, based on what they could piece together about it from all the contextual information held by the university in a more immediately accessible state.

Hence the earlier the institutional repository can start forging relationships with researchers, the better. But it’s something for the institutional archive to worry about, and get the funding for, not the researcher.

Published 11 September 2017
Written by Dave Gerrard

Creative Commons License

Continuing the conversation: a CRUK workshop on RDM

In May 2017 the Office of Scholarly Communication organised a workshop with Paola Quattroni from Cancer Research UK (CRUK) focusing on data sharing policy and practices. It was a great opportunity for the funder to outline its policies and current initiatives on data sharing and for the Cambridge researchers to discuss the issues, suggest further solutions and give feedback to the funder about the changes they would like to see implemented. This blog highlights the main points of the workshop.

This session was continuing  the conversation from February last year when the CRUK and Wellcome Trust came to Cambridge to speak to our research community.

CRUK’s grand ambition

In her presentation “Data sharing in policy and practice with Cancer Research UK“, Paola Quattroni began with CRUK’s grand ambition: “To bring forward the day all cancers are cured” and “see three quarters of people surviving cancer within the next 20 years.

One of the key elements to materialise this and maximise public benefit is data sharing. CRUK firmly believes that transparency, research integrity and swift dissemination and reproducibility of research results are key ingredients to the success.

“Our goal is to improve how research is carried out,” explained Paola, who is the Research Funding Manager – Data at CRUK. “We fund the best science and expect researchers to follow best practices… Improving patient benefit and health is our ambition.”

She emphasised the need to have ongoing discussions with the research community and work together on how to overcome barriers to data sharing. Appropriate sharing and dissemination of research data are particularly important for CRUK, and good data management is the first step to get most from the data and facilitate sharing and re-use. In this context, CRUK is actively working to increase and improve data sharing by being instructive but not necessarily demanding in its requirements.

The audience

The majority of the attendees came from the fields of Biological Sciences and Clinical Medicine. When asked why they came to the workshop the consensus was to be informed regarding the CRUK policy and what actions they needed to take. Examples of individual responses included:

  • To learn how to fulfil funders’ requirements.
  • To learn more about processing data.
  • To know the policy on sharing code and data.
  • To learn the difference between data sharing and open data.
  • To discuss about the costs of storing data and how to be able to forecast costs for periods of more than 10 years.
  • To learn more about contractual agreements.
  • To learn what the funder expects regarding data sharing.
  • To learn and inform other colleagues about it.

The structure

The workshop started with an icebreaker. The audience was asked to pinpoint why they came to the workshop and what they hoped to gain from it. Following that, Paola Quattroni presented CRUK’s policy on the management and sharing of data, explained why data sharing is important, what are the barriers and outlined current initiatives to improve data sharing among researchers.

Paola highlighted some of the work CRUK is doing to increase data sharing such as the recent signing of the San Francisco Declaration of Research Assessment (DORA) and the fact that CRUK is continuing to work with others to put it into practice. Other future activities include:

  • Encouraging grant applicants to explain the significance and impact of their discoveries, publications and a broad range of other outputs (e.g. policy influence).
  • Being more explicit about evaluating grant applicants’ publications according to their scientific content, rather than simply consider where they are published.
  • Working with reviewers and committee members to evaluate the impact of all research outputs.
  • Measuring the re-use of research.
  • Encouraging replication studies.
  • Recognising and rewarding researchers who share their data.

After the presentation, everybody split into groups and identified various challenges of data sharing which were then analysed by the teams and the trainers. The last part of the workshop concentrated on group feedback and suggestions from the audience on what funders could do to further enhance collaboration with the research community.

Challenges

The workshop continued by splitting into groups. Each group identified challenges and problems of data sharing with regard to Publishing, Skills and Training, Rewards and Data Infrastructure:

Publishing

A recurring item among all groups was the fear of being scooped and the loss of publication opportunities. Also, that the impact factor is still be-all, end-all. Other challenges included:

  • Accepting citations of preprints as a metric of achievement – can be dangerous as groups can release data non-peer reviewed online to discourage innovation of competitors.
  • Range of requirements across different journals/publishers.
  • Need to take care not to kill analytical innovation.
  • The larger the collaboration the higher the importance of a standardised data format and analysis.

Skills & Training

The Skills and Training section concentrated on how to write data management plans and standardise laboratory notes as well as the necessary training to catch up with technology. Other points included:

  • Lack of computer skills/knowledge to physically upload data.
  • Formatting data.
  • Version Control.

Rewards

It was apparent in most of the groups that time, cost and re-usability problems were significant inhibitors regarding rewards and incentives:

  • There is a need to overcome the ‘time burden’ aspect of sharing.
  • Cost and Time – solution: Electronic Laboratory Notebooks (ELN) – one or many? Public or private?
  • New PI (Persistent Identifier) for metrics.
  • Re-usability – how do you measure it?
  • DMPs are required at the time of grant submission. However, the researcher needs to report after one year because various parameters can change and might need to be re-adjusted.

Data Infrastructure

The need for standardisation in data acquisition, storage and analysis methods and how ‘big data’ is handled by the funders were common themes in this category. In addition, it was pinpointed that individual Institutes should have the infrastructure to support data sharing and DMP writing.

Other data infrastructure challenges included:

  • Data formats – for example there are so many different scanners for imaging, which all have different formats.
  • EU project testing imaging modality across 20 sites where integrating the data is a challenge. The analogy is a clinical trial where protocols and practices have to produce comparable data.
  • Cost of the software: there are open source imaging software available. However, you may need different imaging analysis tools.

Solutions

Although there was not enough time to concentrate on all challenges, the ongoing discussions turned into ideas that provided the seeds for possible solutions or change of strategies regarding how data is being valued and shared.

For example, what if you are just scooped? Would citations help? One solution is that if you have a DOI stamp this can be evidence that you were first.

Currently, publications are considered to be the sole reward so there is a wide fear of loss of publication opportunities. However, if your data is more valuable than the paper, then the dataset becomes the incentive and is highly valued. How can this be achieved? Micropublishing? If you can build a career on data publishing instead of papers, it would change the incentive strategy. Instead of relying on the old system where there is a big story, what about writing a small story or event data papers? Data in conjunction with data notes is a type of article. These kind of outputs are valuable and publishers should consider this.

Despite the fact that staff working for funders have often been researchers themselves, they could visit researchers from different disciplines to get an idea of what is needed, especially with discipline specific DMPs. Some participants suggested that DMPs should be discipline-specific and standardised. As an example, if preclinical and clinical data had the same format, such data could easily be compared.

Another solution proposed by the participants to the financial challenges associated with data sharing could be an open access fund for data, similar to COAF that supports the cost of infrastructure and rewards openness.

Conclusions

As already mentioned, the discussions evolved to the point that there was no time left to analyse all challenges and talk about practical issues.

For example, there was a clear need from the participants’ point of view for practical guidance on data plans and distinct approaches per field (STEM/HASS). Questions arose about the use and cost of ELNs and any implications in the future.  Similarly, about what happens if data needs to be deposited somewhere else or in the middle of the plan. What would the rules be for additional funding midway in such instances? Lastly, preservation and infrastructure costs that associate projects in the long term was another big topic as well future funders’ strategies regarding ‘big data’. (See this blog for a discussion on the cost issue).

This workshop brought together researchers from different disciplines interested in learning more about data management and sharing at CRUK. From the funder’s perspective, it was a great opportunity to discuss policies and initiatives in data sharing and to hear directly from researchers about the main barriers to data sharing. CRUK strives to help researchers overcome these barriers and is actively working to facilitate the way research is carried out and ultimately shared.

It was agreed that this workshop was only the beginning and highlighted that collaboration is key to overcome some of these challenges.

The main outcomes, however, were clear from the onset:

  • There is a recognised need for ongoing collaboration between funders, researchers and institutions.
  • A global view is required – all funders should have the same vision and aims regarding data sharing.
  • Reporting and disseminating all data is key.
  • Data needs to be available and reusable.
  • We need to overcome the technical and infrastructure challenges of how to measure the “journey” of the data and its re-usability.

Published 07 September 2017
Written by Maria Angelaki

Creative Commons License

What I wish I’d known at the start – setting up an RDM service

In August, Dr Marta Teperek began her new role at Delft University in the Netherlands. In her usual style of doing things properly and thoroughly, she has contributed this blog reflecting on the lessons learned in the process of setting up Cambridge University’s highly successful Research Data Facility.

On 27-28 June 2017 I attended the Jisc’s Research Data Network meeting at the University of York. I was one of several people invited to talk about experiences of setting up RDM services in a workshop organised by Stephen Grace from London South Bank University and Sarah Jones from the Digital Curation Centre. The purpose of the workshop was to share lessons learned and help those that were just starting to set up research data services within their institutions. Each of the presenters prepared three slides: 1. What went well, 2. What didn’t go so well, 3. What they would do differently. All slides from the session are now publicly available.

For me the session was extremely useful not only because of the exchange of practices and learning opportunity, but also because the whole exercise prompted me to critically reflect on Cambridge Research Data Management (RDM) services. This blog post is a recollection of my thoughts on what went well, what didn’t go so well and what could have been done differently, as inspired by the original workshop’s questions.

What went well

RDM services at Cambridge started in January 2015 – quite late compared to other UK institutions. The late start meant however that we were able to learn from others and to avoid some common mistakes when developing our RDM support. The Jisc’s Research Data Management mailing list was particularly helpful, as it is a place used by professionals working with research data to look for help, ask questions, share reflections and advice. In addition, Research Data Management Fora organised by the Digital Curation Centre proved to be not only an excellent vehicle for knowledge and good practice exchange, but also for building networks with colleagues in similar roles. In addition, Cambridge also joined the Jisc Research Data Shared Service (RDSS) pilot, which aimed to create a joint research repository and related infrastructure. Being part of the RDSS pilot not only helped us to further engage with the community, but also allowed us to better understand the RDM needs at the University of Cambridge by undertaking the Data Asset Framework exercise.

In exchange for all the useful advice received from others, we aimed to be transparent about our work as well. We therefore regularly published blog posts about research data management at Cambridge on the Unlocking Research blog. There were several additional advantages of the transparent approach: it allowed us to reflect on our activities, it provided an archival record of what was done and rationale for this and it also facilitated more networking and comments exchange with the wider RDM community.

Engaging Cambridge community with RDM

Our initial attempts to engage research community at Cambridge with RDM was compliance based: we were telling our researchers that they must manage and share their research data because this was what their funders require. Unsurprisingly however, this approach was rather unsuccessful – researchers were not prepared to devote time to RDM if they did not see the benefits of doing so. We therefore quickly revised the approach and changed the focus of our outreach to (selfish) benefits of good data management and of effective data sharing. This allowed us to build an engaged RDM community, in particular among early career researchers. As a result, we were able to launch two dedicated programmes, further strengthening our community involvement in RDM: the Data Champions programme and also the Open Research Pilot Project. Data Champions are (mostly) researchers, who volunteered their time to act as local experts on research data management and sharing to provide advice and specialised training within their departments.The Open Research Pilot Project is looking at the benefits and barriers to conducting Open Research.

In addition, ensuring that the wide range of stakeholders from across the University were part of the RDM Project Group and had an oversight of development and delivery of RDM services, allowed us to develop our services quite quickly. As a result, services developed were endorsed by wide range of stakeholders at Cambridge and they were also developed in a relatively coherent fashion. As an example, effective collaboration between the Office of Scholarly Communication, the Library, the Research Office and the University Information Services allowed integration between the Cambridge research repository, Apollo, and the research information system, Symplectic Elements.

What didn’t go so well

One of the aspects of our RDM service development that did not go so well was the business case development. We started developing the RDM business case in early 2015. The business case went through numerous iterations, and at the time of writing of this blog post (August 2017), financial sustainability for the RDM services has not yet been achieved.

One of the strongest factors which contributed to the lack of success in business case development was insufficient engagement of senior leadership with RDM. We have invested a substantial amount of time and effort in engaging researchers with RDM and by moving away from compliance arguments, to the extent that we seem to have forgotten that compliance- and research integrity-based advocacy is necessary to ensure the buy in of senior leadership.

In addition, while trying to move quickly with service development, and at the same time trying to gain trust and engagement in RDM service development from the various stakeholder groups at Cambridge, we ended up taking part in various projects and undertakings, which were sometimes loosely connected to RDM. As a result, some of the activities lacked strategic focus and a lot of time was needed to re-define what the RDM service is and what it is not in order to ensure that expectations of the various stakeholders groups could be properly managed.

What could have been done differently

There are a number of things which could have been done differently and more effectively. Firstly, and to address the main problem of insufficient engagement with senior leadership, one could have introduced dedicated, short sessions for principal investigators on ensuring effective research data management and research reproducibility across their research teams. Senior researchers are ultimately those who make decisions at research-intensive institutions, and therefore their buy-in and their awareness of the value of good RDM practice is necessary for achieving financial sustainability of RDM services.

In addition, it would have been valuable to set aside time for strategic thinking and for defining (and re-defining, as necessary) the scope of RDM services. This is also related to the overall branding of the service. In Cambridge a lot of initial harm was done due to negative association between Open Access to publications and RDM. Due to overarching funders’ and government’s requirements for Open Access to publications, many researchers started perceiving Open Access to publications merely as a necessary compliance condition. The advocacy for RDM at Cambridge started as ‘Open Data’ requirements, which led many researchers to believe that RDM is yet another requirement to comply with and that it was only about open sharing of research data. It took us a long time to change the messages and to rebrand the service as one supporting researchers in their day to day research practice and that proper management of research data leads to efficiency savings. Finally, only research data which are management properly from the very start of the research process can be then easily shared at the end of the project.

Finally, and which is also related to the focusing and defining of the service, it would have been useful to decide on a benchmarking strategy from the very beginning of the service creation. What is the goal(s) of the service? Is it to increase the number of shared datasets? Is it to improve day to day data management practice? Is to to ensure that researchers know how to use novel tools for data analysis? And, once the goal(s) is decided, design a strategy to benchmark the progress towards achieving this goal(s). Otherwise it can be challenging to decide which projects and undertakings are worth continuation and which ones are less successful and should be revised or discontinued. In order to address one aspect of benchmarking, Cambridge led the creation of an international group aiming to develop a benchmarking strategy for RDM training programmes, which aims to create tools for improving RDM training provision.

Final reflections

My final reflection is to re-iterate that the questions asked of me by the workshop leaders at the Jisc RDN meeting really inspired me to think more holistically about the work done towards development of RDM services at Cambridge. Looking forward I think asking oneself the very same three questions: what went well, what did not go so well and what you would do differently, might become for a useful regular exercise ensuring that RDM service development is well balanced and on track towards its intended goals.


Published 24 August 2017
Written by Dr Marta Teperek

Creative Commons License

Summer camp – the scholarly communication way

Growing up, a diet of B-grade movies gave the impression of American summer camps as places where teenagers undertake a series of slapstick events in the wilderness. That may indeed be the case sometimes, but at the University of California San Diego campus recently, a group of decidedly older people bunked in together for a completely different type of summer camp.

The inaugural FORCE11 Scholarly Communications Institute (FSCI) was held in the first week of August, bringing together librarians, researchers and administrators from around the world. The event was planned as a week long intensive summer school on improving research communication. The activities were spread all over the campus, although not, unfortunately in the mother of all spaceships for a library.

The event hashtag was #FSCI and the specific hashtag for the course, “Building an Open and Information Rich Institution”  I ran with Sarah Shreeves from University of Miami was #FSCIAM3. This blog is a brief run down of what we covered in the course.

Our course

We had a wonderful group of people, primarily from the library sector, and from around the world (although many were working in American universities).

From the delivery perspective this was an intense experience requiring 14 hours of delivery plus the documentation and follow up each day. It was further complicated by the fact that Sarah and I met for first time in person half an hour before delivery on the Monday.

Working within open and F.A.I.R principles, we have made all of our resources and information available and links to all the Google documents are included in this blog post.The shared Google Drive has links to everything. These presentations will be uploaded to the FSCI Zenodo site when it is available. In addition the group created a Zotero page which collects together relevant links and resources as they arose in discussion.

Monday – Problem definition

Using an established process the group worked together to define the problems we were looking to address in scholarly communication:

  • OA takes time and money – and the tools are annoying.
  • We need to reduce complexity – make it easy administratively
  • It is important to recognise difference – one size does not fit all, there are cultural and country norms in publishing and prestige
  • Motivation – what are the incentives? How can we demonstrate benefit?
  • There is a need for advocacy and training of various stakeholders including within library
  • We can demonstrate the repository as a free way of publishing with impact tracking – for both the author and the institution.
  • Whose responsibility is this?

The slides from the first day (including the workings of the group) are available.

Tuesday – Stakeholder mapping

On the second day we discussed the different stakeholders in institutions and external to institutions in this space. Each table created a pile of post it notes which were then classified on a large grid on the wall against ‘interest’ versus ‘influence’. We then discussed which stakeholders we needed to work with, and whether it is possible to move the stakeholder from one of the quadrants into another. We also discussed the value in using some stakeholders to reach others.

A second exercise we ran was ‘responding to objections’ – where we gave the group a few minutes to create objections that different stakeholders may have to aspects of scholarly communication. These were then randomised and the group had only a couple of minutes to develop an ‘elevator pitch’ to respond to that objection. The slides from day 2 incorporate the comments, objections and counter arguments.

 

Wednesday – Communication

We started the day with a ‘gathering evidence’ exercise that consisted of a series of questions that were allocated to each table to discuss with a view to the kind of information held in an institution that might be helpful to answer it. Examples of the type of questions we asked the group to consider are: How do we better understand and communicate with the range of disciplines on campus? (with a goal of creating advocacy materials that support the range of disciplinary needs of the institution) or Who is doing collaborative research with others on campus and with others outside of the university? Is there interdisciplinary research? (with a goal of creating a map of collaborations on campus).

We moved to an exercise to demonstrate the need for clear communication. People worked in pairs and had a pile of building bricks which they were asked to build a shape from. They then had five minutes to describe their shape. After this the instructions were swapped and the opposite pair tried to reproduce the shape from the instructions.The results were surprising – with fewer than 50% of shapes reproduced. However, looking at some of the instructions, things became clearer. Note the description ‘cute kitty’ in these instructions.

 

The final session on day three was a risk assessment exercise where we put up the proposal ‘that we will make all digitised older theses open access without obtaining permission from the authors’. The tables were asked to come up with potential risks that could arise from this proposal, and then asked to map these onto a grid that considered the likelihood and severity of each risk.

Then the group discussed what could be done to mitigate the risks they identified, and then determine if the risk could then be moved within the grid. Again, all discussions are captured in the slides.

Thursday – Governance

On the Thursday we considered matters of governance. Dominic Tate from Edinburgh talked the group through the management structure at his institution, and how they have managed to create a strong decision making governance.

Using a system of mapping organisational structure to the decision structure, the group identified a goal they would like to achieve at their workplace and then to consider the aspects that are Strategic, Tactical and Operational. They then identified the person/people/group that will need to agree at each of these stages to achieve the end goal,and whether this was something that could be managed within the immediate organisation or does it involve the wider institution. We also discussed whether policies would need to be changed or created, and the level of consultation needed. The slides describe the process.

At the end of this day we broke into two groups for an unconference. One group discussed the UK Scholarly Communication Licence, the other continued on the governance discussion by identifying stakeholders and working out how to approach them.

Friday – The future

On the last day we discussed the best way to share stories with the relevant stakeholders – what is the best way to present the information? How do you get it to the person?

We then looked to the future, first by considering big disruptor technologies on the last 20 years. We asked people to share their work experiences before these technologies existed, to give us an idea of how much things will potentially change into the future with the next big disruptors. We then asked individuals to identify future issues that they will need to address at their institution, which they then sorted at the table level before we did a group consolidation to identify what the issues will be.

Each group chose one of these issues/futures, and in a mini overview of the work we had done throughout the week, they undertook a stakeholder assessment – who would they have to engage to make this happen? They also identified the governance structures in place, and the type of information they would want in place to make decisions about moving in this area. Sone of the discussions are captured in the slides.

Assessment of the course

When developing the course we articulated what we hoped the participants would get out of the week. These included the ability to:

  • Think strategically and comprehensively about openness and their institution
  • Articulate the ‘why’ of openness for a variety of stakeholders within an institution
  • Articulate how information related to research and outputs flows through an institution and understand challenges to this flow of information
  • Understand the practicalities of delivering open access to research outputs and research data management within an institution
  • Consider the technology, expertise, and resources required to support open research

So how did we do? Well according to the feedback at the beginning and end of the week we certainly hit all the targets the participants identified.

The responses at the beginning of the week were:

  

And the feedback at the end of the week was:

           

Interestingly, the Governance session was the least popular session we ran, but it rated extremely highly in the areas the participants self identified as learning about.

 

Several people went out of their way to tell Sarah and I that ‘this was the best training/workshop I have ever done’ which is very high praise.

On the Friday afternoon all of the participants for FSCI got back together to provide feedback about what happened in their courses. These ranged from an explanation of what people did, to participants describing what they knew, to poems. There was no expressionist dance unfortunately (perhaps next year). Sarah and I chose to describe our week in pictures.

Wrap of the week

While it was slightly disorienting spending a week in student accommodation, overall this was a valuable and rewarding experience – if extremely intense. Our group of just over 120 people was only one of several ‘camps’ happening at the same time, including electronic music and programming groups. We all converged on the dining hall each meal, a big hodge podge of people.

The largest and most intrusive group was the teenagers at the San Diego District Police camp. This is a para military organisation, we discovered. This did go some way to explain the line ups at 6am and also at 9pm, the groups shouting their responses in unison, and the instructors wandering around with guns on their hips.

On a much more peaceful note, San Diego is where Dr Seuss lived, and looking at the vegetation and landscape it is easy to see where his inspiration originated.

    

Published 22 August 2017
Written by Dr Danny Kingsley 
Creative Commons License

Next steps for Text & Data Mining

Sometimes the best way to find a solution is to just get the different stakeholders talking to each other – and this what happened at a recent Text and Data Mining symposium held in the Engineering Department at Cambridge.

The attendees were primarily postgraduate students and early career researchers, but senior researchers, administrative staff, librarians and publishers were also represented in the audience.

Background

This symposium grew out of a discussion held earlier this year at Cambridge to consider the issue of TDM and what a TDM library service might look like at Cambridge. The general outcome of that meeting of library staff was that people wanted to know more. Librarians at Cambridge have developed a Text and Data Mining libguide to assist.

So this year the OSC has been doing some work around TDM, including running a workshop at Research Libraries UK annual conference in March. This was a discussion about developing a research library position statement on Text and Data Mining in the UK. The slides from that event are available and we published a blog post about the discussion.

We have also had discussions with different groups about this issue including the Future TDM project which has been looking to increase  the amount of TDM happening across Europe. This project is now finishing up. The impression we have around the sector is that ‘everyone wants to know what everyone else is doing’.

Symposium structure

With this general level of understanding of TDM as our base point, we structured the day to provide as much information as possible to the attendees. The Twitter hashtag for the event is #osctdm, and the presentations from the event are online.

The keynote presentation was by Kiera McNeice, from the FutureTDM Project who have an overview of what TDM is, how it can be achieved and what the barriers are. There is a video of her presentation (note there were some audio issues in the beginning of the recording).

The event broke into two parallel sessions after this. The main room was treated to a presentation about Wikimedia from Cambridge’s Wikimedian in Residence, Charles Matthews. Then Alison O’Mara-Eves discussed Managing the ‘information deluge’: How text mining and machine learning are changing systematic review methods. A video of Alison’s presentation is available.

In the breakout room, Dr Ben Outhwaite discussed Marriage, cheese and pirates: Text-mining the Cairo Genizah  before Peter Murray Rust spoke about ContentMine: mining the scientific literature.

After lunch, Rosemary Dickin from PLOS talked about Facilitating Test and Data Mining how an open access publisher supports TDM. PhD candidate Callum Court presented ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature. This presentation was filmed.

In the breakout room, a discussion about how librarians support TDM was led by Yvonne Nobis and Georgina Cronin. In addition there was a presentation from John McNaught –  the Deputy Director of the National Centre for Text and Data Mining (NaCTeM), who presented Text mining: The view from NaCTeM .

Round table discussion

The day concluded with the group reconvening together for a roundtable (which was filmed) to discuss the broader issue of why there is not more TDM happening in the UK.

We kicked off by asking each of the people who had presented during the event to describe what they saw as the major barrier for TDM. The answers ranged from the issue of recruiting and training staff to the legal challenges and policies needed at institutional level to support TDM and the failure of institutions and government to show leadership on the issue. We then opened up the floor to the discussion.

A librarian described what happens when a publisher cuts off access, including the process the library has to go through with various areas of the University to reinstate access. (Note this was the reason why the RLUK workshop concluded with the refrain: ‘Don’t cut us off!’). There was some surprise in the group that this process was so convoluted.

However, the suggestion that researchers let the library know that they want to do TDM and the library will organise permissions was rejected by the group, on both the grounds that it is impractical for researchers to do this, and that the effort associated with obtaining permission would take too long.

A representative from Taylor and Francis suggested that researchers contact the publishers directly and let them know. Again this was rejected as ‘totally impractical’ because of the assumption this made about the nature of research. Far from being a linear and planned activity, it is iterative and  to request access for a period of three months and to then have to go back to extend this permission if the work took an unexpected turn would be impractical, particularly across multiple publishers.

One attendee in her blog about the event noted: “The naivety of the publisher, concerning research methodology, in this instance was actually quite staggering and one hopes that this publisher standpoint isn’t repeated across the board.”

Some researchers described the threats they had received from publishers about downloading material. There was anger about the inherent message that the researcher had done something criminal.

There was also some concern raised that TDM will drive price increases as publishers see ‘extra value’ to be extracted from their resources. This sparked off a discussion about how people will experiment if anything is made digitally available.

During the hour long session the conversation moved from high level problems to workflows. How do we actually do this? As is the way with these types of events, it was really only in the last 10 minutes that the real issues emerged.  What was clear was something I have repeatedly observed over the past few years – that the players in this space including librarians, researchers and publishers, have very little idea of how the others work and their needs. I have actually heard people say: ‘If only they understood…’

Perhaps it is time we started having more open conversations?

Next steps

Two things have come out of this event. The first is that people have very much asked for some hands on sessions. We will have to look at how we will deliver this, as it is likely to be quite discipline specific.

The second is there is clearly a very real need for publishers, researches and librarians to get into a room together to discuss the practicalities of how we move forward in TDM. One of the comments on Twitter was that we need to have legal expertise in the room for this discussion. We will start planning this ‘stakeholder’ event after the summer break.

Feedback

The items that people identified as the ‘one most important thing’ they learnt was instructive. The answers reflect how unaware people are of the tools and services available, and of how access to information works. Many of the responses listed specific tools or services they had found out about, others commented on the opportunities for TDM.

There were many comments about publishers, both the bad:

  • Just how much impact the chilling effect of being cut off by publishers has on researchers
  • That researchers have received threats from publishers
  • Very interesting about publishers and ways of working with them to ensure not cut off
  • Lots can be done but it is being hindered by publishers

and the good:

  • That PLOS is an open access journal
  • That there are reasonable publishing companies in the UK
  • That journals make available big data for meta analysis

Commentary about the event

There has been some online discussion and blog posts on the event:

Published 17 August 2017
Written by Dr Danny Kingsley 
Creative Commons License

Planning scholarly communication training in the UK

In June 2017 a group of people (see end for attendees) met in London to discuss the issues around scholarly communication training delivery in the UK. Representatives from RLUK, UKSG, SCONUL, UKCoRR, Vitae, Jisc and some universities had a workshop to nut through the problem. Possibly because of the nature of the attendees of the group, the discussion was very library-centric, but this does not preclude the need for training outside the library sector. This blog is a summary of the discussion from that day.

Background

The decision to hold a meeting like this came out of the a library skills workshop run at UKSG recently. In ensuing discussions, it was agreed that it would be a good idea to get stakeholders together for a symposium of some description to try and nut out how we could collaborate and provide training solutions for scholarly communication across the sector. There is plenty of space in this area for multiple offerings but we do want to make sure we are covering the range of areas and the types of delivery modes and levels required. In preparation for the discussion the group created a document listing scholarly communication training on offer currently.

What is scholarly communication?

An informal survey of research libraries in the UK earlier this year showed that while all respondents had some kind of service that supports aspects of scholarly communication, only half actually used the term ‘scholarly communication’ to describe those services.

A discussion around the table concluded that the term scholarly communication encompasses a wide range of definitions. Some libraries take the boundary that it refers to post-publication. Others address the pre-publication aspect and meet the need of Early Career Researchers for advice on publishing. Services can focus on the academic’s profile of themselves and their research, or the research lifecycle. In some cases there is a question about whether research data management is part of the equation.

The failure of library schools to deliver

It is fairly universally acknowledged that it is a challenge to engage with library schools on the issue of scholarly communication, despite repositories being a staple part of research library infrastructure for well over a decade. There are a few exceptions but generally open access or other aspects of scholarly communication are completely absent from the curricula. (Note: any library school that wishes to challenge this statement, or provide information about upcoming plans are welcome to send these through to info@osc.cam.ac.uk)

This raises the question – if library schools are not providing, how do we recruit and train the staff we need? Indeed, who are we actually recruiting? Is it essential for staff to have a library degree, or experience in an academic library? Or are our requirements more functional such as the ability to manipulate large data sets, or experience working with academics, or an understanding of the Higher Education environment?

While libraries are starting to employ post-graduate researchers because they can lend skills to the library, library culture is a consideration. Employing researchers who are not librarians has the benefit of bringing in expertise from outside, but there are challenges to integrate their work into the library culture. We need to look at competencies in terms of the structure and size of the organisation, both for current staff and staff of the future.

In the absence of scholarly communication instruction within the basic qualification, skills training in this space would appear to need to be addressed at the profession level.

One possible route to prepare the next generation is offering some modular approach of on the job learning with very practical experience. An option could be to work with people who have come from outside the library space. Given libraries seem to be starting to bring skill sets in, we need to consider how this sits with the existing profession.

Audiences and their training needs

The goal of the meeting was to resolve what kinds of training the sector needs, for whom and how it is delivered. For example, with many general library staff there is a basic need to understand the issues with scholarly communication. The number one question is ‘what is scholarly communication’? The possibly it is enough for these people to just be familiar with the terminology.

It is possible we need lots of short courses on the general topic of: this is what OA is, basics of RDM etc (that could potentially be delivered online), but probably fewer more complex courses on issues like analysing publisher and funder policies. There are also debates and higher order areas which require face to face debate.

  • Front facing staff
    • Need an overview so the language is familiar and they can refer queries on
  • People working in scholarly communication
    • Day to day practicalities of funder open access compliance
  • Specialist roles in scholarly communication
    • Specific areas
  • Senior managers
    • Very much need a refresher so they can help their staff.
    • Similar overview training, leadership is around the advocacy
    • Need conceptual framework for scholarly communication – how do the technical parts sit together for the infrastructure and governance of institutions
    • Stakeholder management skills.

Skill sets in scholarly communication

It was agreed that budgetary, presentation and negotiation skills are needed in this area as general skills. When it comes to specialist skills these include:

  • Research Integrity
  • Bibliometrics
    • Involved in providing specialist advice on metrics within a school discussion
    • Providing advice on impact
  • Pushing the open research agenda
  • Academic reward structure
  • Technical and infrastructure eg: integrating ORCIDS etc

Considerations – Lack of perceived need?

There appears to be a problem with a lack of perceived need for training in this space. We are encountering issues where people in libraries are saying ‘I don’t think this is our job’. This points to what should we be presenting librarianship as – what kind of people do we want in the profession? A ‘traditional librarian’ of 20 years ago is not the same job now, the skills are different. Today much of an academic librarian’s job is about winning over people who don’t want to hear the message. It is possible there does need to be a different sort of person who is pushing an open access agenda.

There have been other innovations in library work that required engaging different behaviours and tasks in the past. For example, is this move towards a scholarly communication future different from when the discovery search was introduced? The eResources experience is similar in terms of new competencies required in the profession. However the difference in the scholarly communication environment is there is an external driver – we need to understand the politics of how open access can move forward in the UK.

Considerations – budgets

There is a mismatch between what people would love to have, what can be designed and what people can afford. Anecdotally the group heard that training budgets are really squeezed so priority and focus might be heavily influenced by this, with geography and travelling costs being central to decisions.

The group discussed the need to make training accessible to all. Even free events can be prohibitive in terms of travel, and hosting them in off-peak periods can be helpful with costs. The blockage is not just money, it includes time – in terms of loss of a team member while they are away. This is particularly problematic if scholarly communication is only a part of their job. Most of the need comes from really small institutions where the work is part of a bigger role, however that is where there is little money. This also raises challenges for the time available for those people to self educate.

UKSG run events in London which is expensive for organisations north of London to attend. To increase participation UKSG are now trying to put regional events on, and have shifted their training to a webinar programme rather than face to face.

SCONUL has done basic copyright training and this has thrown up price sensitivity. One solution is trying to keep it local, and members can volunteer staff in kind.

One option could be online training where participants log on at a certain time once a week for 10 weeks. Many of the people in scholarly communication work in universities, and have distance education software available to them. An alternative is having courses done in house – that could part of a modular package (but how do you link this?). The course content needs to be agnostic enough to be useful (not discussing DSpace or PURE for example) before delving into institutional specifics. Make it modular with core principles and then have options.

There was a suggestion that we create a nonprofit making shared collaborative service. The costs to developing this type of deliverable include the development of the training materials, infrastructure costs, room hire, catering etc. Can we make it all online and available? This could work if it were modular.

Next steps

We have not yet bottomed out the need yet – perception of needs at the practitioner level and senior management might be different. Cost is an issue here. Universities need to work out how much it costs to do in-house training – what is the opportunity cost to employ a staff member without experience or training and then get them up to speed?

It would be useful to have an understanding of what training is happening within institutions. What subjects/topics are being taught, who is doing it, what language is being used, is there a dedicated staff member. Where else do people get information and support?

The general plan is to reconvene in September.

Useful Resources

Skill sets analyses

Here are links to work that has already been done on the required skill sets:

Organisations providing or coordinating training

Organisations are running similar events and then participants have to choose what to focus on. If we divvy it up across the sector it might help the situation.

The Society for College, National and University Libraries (SCONUL) does basic copyright training. There is more focus on the leadership end of the equation. The Collaboration Strategy Group is considering a shared service. People come from non traditional groups and this reflects a broader skills sets required in libraries than traditional library courses give you. SCONUL are about to scope out where those services might be and try to identify needs into the future. There are challenges are in recruiting people given the slightly moralistic nature of library culture and whether they are welcoming of people from different background. How do we promote, retain and incentivise people who may not come from this area?

Research Libraries UK (RLUK) don’t do direct training, but they do have programmes of works and networks around these issues. The RLUK board recently had a meeting to look at a new strategy – updating the existing 2014-2017 RLUK Strategy. They are looking at the bigger picture for scholarly communication – the infrastructure challenges, the bigger picture related to licensing and costs and how to leverage members in the consortia. Their role is very much supporting and helping out.

UK Serials Group (UKSG) runs a conference programme. One day events are a mix of standing repeated courses and one off sessions. In conferences often the breakout sessions are the things that people find really valuable. These include soft skills like mindfulness in leadership. The audience tends to be practitioners, people in their mid-career. Traditional areas such as library have been focused around collection management because that is where publishers are. But it is not just about traditional publishing. They are our members and that is moving our agenda to meet those needs. UKSG cannot get anywhere in contributing to university publishing courses. Libraries are starting to employ people who have publishing backgrounds.

The Association of Research Managers and Administrators (ARMA) has special interest groups in open access. (Note: ARMA were invited to this meeting but unfortunately couldn’t attend.)

The Chartered Institute for Library and Information Professionals (CILIP) conducts training at a local level. It was agreed we can’t have the conversation without having CILIP in the room – they are wanting to offer more support for academic libraries and seem to be recognising that the library schools program for CILIP is not the be-all and end-all any more. This is partly why they have developed a recognised trainer programme. (Note: CILIP were invited to this meeting but unfortunately couldn’t attend.)

Representatives attending the discussion

  • Helen Dobson – Manchester University
  • Danny Kingsley – Cambridge University
  • Claire Sewell – Cambridge University
  • Anna Grigson representing UKSG
  • Fiona Bradley – RLUK
  • Ann Rossiter – SCONUL
  • Katie Wheat – Vitae
  • Sarah Bull – UKSG
  • Stephanie Meece -UKCoRR
  • Frank Manista – Jisc
  • Helen Blanchett – Jisc (a member of the group coordinating the meeting, but was unable to attend on the day)

ARMA and CILIP were also invited but were not able to send a representative.

Published 15 August 2017
Written by Dr Danny Kingsley 

Sustaining open research resources – a funder perspective

This is the second in a series of three blog posts which set out the perspectives of researchers, funders and universities on support for open resources. The first was Open Resources, who should pay? In this post, David Carr from the Open Research team at the Wellcome Trust provides the view of a research funder on the challenges of developing and sustaining the key infrastructures needed to enable open research.

As a global research foundation, Wellcome is dedicated to ensuring that the outputs of the research we fund – including articles, data, software and materials – can be accessed and used in ways that maximise the benefits to health and society.  For many years, we have been a passionate advocate of open access to publications and data sharing.

I am part of a new team at Wellcome which is seeking to build upon the leadership role we have taken in enabling access to research outputs.  Our key priorities include:

  • developing novel platforms and tools to support researchers in sharing their research – such as the Wellcome Open Research publishing platform which we launched last year;
  • supporting pioneering projects, tools and experiments in open research, building on the Open Science Prize which with the NIH and Howard Hughes Medical Institute;
  • developing our policies and practices as a funder to support and incentivise open research.

We are delighted to be working with the Office of Scholarly Communication on the Open Research Pilot Project, where we will work with four Wellcome-funded research groups at Cambridge to support them in making their research outputs open.  The pilot will explore the opportunities and challenges, and how platforms such as Wellcome Open Research can facilitate output sharing.

Realising the long-term value of research outputs will depend critically upon developing the infrastructures to preserve, access, combine and re-use outputs for as long as their value persists.  At present, many disciplines lack recognised community repositories and, where they do exist, many cannot rely on stable long-term funding.  How are we as a funder thinking about this issue?

Meeting the costs of outputs sharing

In July 2017, Wellcome published a new policy on managing and sharing data, software and materials.  This replaced our long-standing policy on data management and sharing – extending our requirements for research data to also cover original software and materials (such as antibodies, cell lines and reagents).  Rather than ask for a data management plan, applicants are now asked to provide an outputs management plan setting out how they will maximise the value of their research outputs more broadly.

Wellcome commits to meet the costs of these plans as an integral part of the grant, and provides guidance on the costs that funding applicants should consider.  We recognise, however, that many research outputs will continue to have value long after the funding period comes to an end.  Further, while it not appropriate to make all research data open indefinitely, researchers are expected to retain data underlying publications for at least ten years (a requirement which was recently formalised in the UK Concordat on Open Research Data).  We must accept that preserving and making these outputs available into the future carries an ongoing cost.

Some disciplines have existing subject-area repositories which store, curate and provide access to data and other outputs on behalf of the communities they serve.  Our expectation, made more explicit in our new policy, is that researchers should deposit their outputs in these repositories wherever they exist.  If no recognised subject-area repository is available, we encourage researchers to consider using generalist repositories – such as Dryad, FigShare and Zenodo – or if not, to use institutional repositories.  Looking ahead, we may consider developing an orphan repository to house Wellcome-funded research data which has no other obvious home.

Recognising the key importance of this infrastructure, Wellcome provides significant grant funding to repositories, databases and other community resources.  As of July 2016, Wellcome had active grants totalling £80 million to support major data resources.  We have also invested many millions more in major cohort and longitudinal studies, such as UK Biobank and ALSPAC.  We provide such support through our Biomedical Resource and Technology Development scheme, and have provided additional major awards over the years to support key resources, such as PDB-Europe, Ensembl and the Open Microscopy Environment.

While our funding for these resources is not open-ended and subject to review, we have been conscious for some time that the reliance of key community resources on grant funding (typically of three to five years’ duration) can create significant challenges, hindering their ability to plan for the long-term and retain staff.  As we develop our work on Open Research, we are keen to explore ways in which we adapt our approach to help put key infrastructures on a more sustainable footing, but this is a far from straightforward challenge.

Gaining the perspectives of resource providers

In order to better understand the issues, we did some initial work earlier this year to canvas the views of those we support.  We conducted semi-structured interviews with leaders of 10 resources in receipt of Wellcome funding – six database and software resources, three cohort resources and one materials stock centre – to explore their current funding, long-term sustainability plans and thoughts on the wider funding and policy landscape.

We gathered a wealth of insights through these conversations, and several key themes emerged:

  • All of the resources were clear that they would continue to be dependent on support from Wellcome and/or other funders for the long-term.
  • While cohort studies (which provide managed access to data) can operate cost recovery models to transfer some of the cost of accessing data onto users, such models were not appropriate for data and software resources who commit to open and unrestricted access.
  • Several resources had additional revenue-generation routes – including collaborations with commercial entities– and these had delivered benefits in enhancing their resources.  However, the level of income was usually relatively modest in terms of the total cost of sustaining the resource. Commitments to openness could also limit the extent to which such arrangements were feasible.
  • Diversification of funding sources can give greater assurance and reduce reliance on single funders, but can bring an additional burden.  There was felt to be a need for better coordination between funders where they co-fund resources.  Europe PMC, which has 27 partner funders but is managed through a single grant is a model which could be considered.
  • Several of the resources were actively engaged in collaborations with other resources internationally that house related data – it was felt that funders could help further facilitate such partnerships.

We are considering how Wellcome might develop its funding approaches in light of these findings.  As an initial outcome, we plan to develop guidance for our funded researchers on key issues to consider in relation to sustainability.  We are already working actively with other funders to facilitate co-funding and make decisions as streamlined as possible, and wish to explore how we join forces in the future in developing our broader approaches for funding open resources.

Coordinating our efforts

There is growing recognition of the crucial need for funders and wider research community to work together develop and sustain research data infrastructure.  As the first blog in this series highlighted, the scientific enterprise is global and this is an issue which must be addressed international level.

In the life sciences, the ELIXIR and US BD2K initiatives have sought to develop coordinated approaches for supporting key resources and, more recently, the European Open Science Cloud initiative has developed a bold vision for a cloud-based infrastructure to store, share and re-use data across borders and disciplines.

Building on this momentum, the Human Frontiers Science Programme convened an international workshop last November to bring together data resources and major funders in the life sciences.  This resulted in a call for action (reported in Nature) to coordinate efforts to ensure long-term sustainability of key resources, whilst supporting resources in providing access at no charge to users.  The group proposed an international mechanism to prioritise core data resources of global importance, building on the work undertaken by ELIXIR to define criteria for such resources.  It was proposed national funders could potentially then contribute a set proportion of their overall funding (with initial proposals suggesting around 1.5 to 2 per cent) to support these core data resources.

Grasping the nettle

Public and charitable funders are acutely aware that many of the core repositories and resources needed to make research outputs discoverable and useable will continue to rely on our long-term funding support.  There is clear realisation that a reliance on traditional competitive grant funding is not the ideal route through which to support these key resources in a sustainable manner.

But no one yet has a perfect solution and no funder will take on this burden alone.  Aligning global funders and developing joint funding models of the type described above will be far from straightforward, but hopefully we can work towards a more coordinated international approach.  If we are to realise the incredible potential of open research, it’s a challenge we must address

Published 26 July 2017
Written by David Carr, Wellcome Trust (d.carr@wellcome.ac.uk)

Creative Commons License