Cambridge Data Champions – reflections on an expanding community and strategies for 2019

The Cambridge Data Champions (DCs) advocate good Research Data Management (RDM) and Open Data practices to researchers locally in their departments, within Cambridge University in general, and sometimes further afield. They network with one another, exchange good methods of RDM, share ideas and, as a collective, reflect on current issues surrounding RDM, Open Data and researcher engagement, where a major shared goal is to establish best practices when it comes to research data. By attending bi-monthly forums facilitated by the Research Data Team, the DCs convene as a community, hear speakers presenting on relevant topics, and engage in workshops that will help them in their ‘championing’ activities. Following up from our latest blog which summarised how a workshop led to the creation of cartoon postcards as a new tool to add to the DCs’ resource kit for RDM advocacy, we are now reflecting on initiatives that sprung from workshops during the past year and are considering the challenges and opportunities that this programme brings as it approaches the end of its third year. 

Growing 

The programme started in Autumn 2016, comprising researchers who volunteered to become local community experts and advocate on research data management and sharing. Our first call welcomed 43 DCs (September 2016), our second call 20 DCs (March 2018) and the third call 40 DCs (January 2019). For simplicity, this year we also added to our statistics the “affiliate” DCs, who are colleagues who contribute to the DC community in other ways (as interested members of Cambridge’s RDM Project Group) and not necessarily through channelling their RDM efforts for the benefit of a specific department.

We are now a community comprised of 87 active DCs. 

Graph showing number of Data Champions (current and alumni) per year between 2016 and 2019.
Total number of Data Champions who joined in each year (orange column indicates Champions who are still active; blue column indicates Champions who are now alumni).

Communities within a community 

Over the last year we caught ourselves using words such as the ‘old DCs’ and the ‘new DCs’ and what we really meant was ‘established DCs’ and ‘new DCs’, with the latter group being those joining the programme each year. In September we celebrate the programme’s third birthday and it is reasonable to expect that there will be more experienced DCs who have already built their networks and have, more or less, a stable offering of RDM support and an enhanced understanding of the needs of their department. On the other hand, there are those who are being welcomed into the group who seek, to differing degrees, initial support from both the RDM team and their fellow colleagues in order to become successful DCs. It is easy to imagine that different layers are being developed with different needs, both in terms of support and engagement.  

Through various activities and feedback from DCs, we now have a good quantity of raw data to analyse their needs for being, as we called it, ‘a good Data Champion’. We have brainstormed ideas which we are putting into action to respond to the challenges of an ever-growing Data Champions group. 

Planning  

DC Welcome Pack 

Word cloud image of "welcome" in different languages  - front page of the Data Champion Welcome pack.

Every year we circulate the Data Champions Welcome Pack to coincide with the inductions we organise to welcome new DCs into the group. This year we included in the pack what it is expected from a DC when s/he joins the programme so that expectations are clearly communicated from the beginning and are the same for everybody. 

Document describing what Data Champions are expected to do as part of the Programme.
Page from the Cambridge Data Champions Welcome Pack

Bi-monthly forums 

Lightning talks have been introduced as a standard item in each forum. These have provided DCs with the opportunity to discuss aspects of RDM they are working on (e.g. new tools and techniques), or to feed back to the group on DC activities undertaken in their departments and data-related events they have attended so that the whole group can benefit. Importantly, the lightning talks have been used by DCs to problem solve, where the collective knowledge and experience of DCs attending a forum has been harnessed to address particular challenges faced by individual DCs. This is where the community aspect of the programme truly shines. 

It is always a priority for us to invite speakers to forums who are external to the programme, reflecting the needs of both the new and established DCs. For example, Hannah Clements from Cambridge University’s Researcher Development Programme (RDP) spoke to the DCs at the January forum about mentoring, providing guidance on how support can be best delivered within the DC community. In the May forum, we had talks and discussions from a panel of experts working on different aspects of data archiving. The panellists came from across the University bringing a diversity of experience, grounded in clinical governance, computing, and more traditional archiving. These examples are just a couple of the themes that we have covered so far in the forums, which have been derived predominantly from information provided by (and the needs of) the DCs themselves. Additional topics that we plan to cover in future forums include issues surrounding reproducibility, IP and commercialisation, publishing and the impact of research data.  

Key aims of these forums are to not only facilitate networking between DCs but to also act as an arena for the transfer of knowledge along the ‘researcher pipeline’, from forum to DCs and from DCs to researchers in their departments.   

DC specialisation group 

As a community, we need to be able to map expertise internally and understand the make-up of such an organic group at any given moment. This makes it is easier to support each other and create collaborations, but also improves how we promote the programme externally.

Table showing specialisation categories and sub-categories for Data Champions
Areas of expertise amongst our Data Champions

This led to the formation of the DC specialisation group, consisting of one of us and six of the DCs, which determined how to categorise expertise within the group. As a result, a spreadsheet was created where all DCs can chart their specialist areas and update or amend when necessary (and at least annually). We have top level categories for simple statistical analysis and second level categories that offer more specific details for the benefit of the DC community. 

The next stage is to include the wider research community and improve how various stakeholders can reach the appropriate Data Champions for initial advice and support in RDM issues. One way to do this is by presenting more coherent and consistent specialisations on the Data Champions’ website, using the categories which we have already created for internal use within the group. This stage is due to begin this month and we hope to report on our efforts next year.  

Branding group 

A growing community is inevitably going to bring to the forefront various identity discussions. With this in mind, we formed a branding group to examine if a DC logo should be created to enhance the Data Champions’ visibility and raise their profile amongst their peers when advocating for RDM. A logo has been created and is going through various stages of approval before it will be released later this year. 

Pilot programme – Mentoring  

In February 2019, we initiated a pilot mentoring project as part of the induction process for the new DCs. The mentors are established DCs who have volunteered to support those new DCs wishing to take part in this pilot exercise. This followed on from our January forum where the benefits of mentoring for both mentees and mentors were outlined by Hannah Clements of RDP. At this forum, which preceded the University-wide call for new DCs, we also held a workshop where DCs were divided into three groups and asked three questions: what do you wish you knew when you first became a DC that you know now; what could you offer as mentors to the new DCs; how do you think the mentor-mentee system could work? The responses from DCs in the three groups informed the implementation, structure and aims of the mentoring pilot.  

Our aim is to learn from this project in close consultation with both mentors and mentees. We want to see if this process helps new DCs to establish themselves within their departments/institutes. Will it be effective? The findings will inform our steps for the following year. Watch this space! 

Fostering clusters within departments 

We have excellent examples of departments that promote their DCs within their institutions. A good example is the Chemistry department, which has a cluster of five DCs who work together in their advocacy. During this year’s call for new DCs, and with help from the Department Librarian, we used a targeted approach at advertising the DC Programme within the Department of Engineering. This was highly successful, resulting in ten new Data Champions from Engineering from various roles and Academic Divisions. They represent a hub with the local knowledge, experience and skills to assess their department’s needs and explore best approaches to support good RDM practices and Open Research, ones that are tailored to the discipline.  

Alumni community 

Heading toward the programme’s third birthday means that we are growing bigger but also that we are developing an alumni community as well. This is a different kettle of fish but it is on our radar to investigate how we can foster this distinct group and build a network that is not only Cambridge based but has a more national and even international outlook.  

Funding  

Let’s not forget that the DC programme consists of volunteers. We are in the process of seeking more funds to support this ever increasing community, to run expanding bimonthly forums, and to be able to offer grants to assist DCs in their endeavours. As an example, we supported one of the DCs, James Savage, to bring the programme to the international stage in November at the SCIDataCon 2018 in Botswana. He talked about the programme as well as his experience of being a DC. This resulted in James writing a paper together with Lauren Cadwallader, to be published soon in Data Science Journal (the accepted manuscript and associated data available now in Apollo, the Cambridge University institutional repository). 

An exciting year so far! 

During this third year of the DC programme the number of active DCs across the University of Cambridge has doubled. We can only anticipate it growing further each year, yet balanced by an expanding community of alumni DCs as, for example, DCs leave Cambridge. The DC community is inherently dynamic, as is the programme. Because of this, we always seek to respond and adapt to changing conditions in novel and beneficial ways while maintaining the programme’s core structure to provide strong foundations. This has been a period of reflection, organisation and anticipation, all required to drive the Data Champion programme forward and tackle current challenges effectively, as well as those that lie ahead – more on this to come soon!  

Written by Maria Angelaki and Dr Sacha Jones

Published 20 June 2019

License logo CCBY

Engagement, infrastructure and roles: themes at #ScholComm19

Dr Beatrice Gini, the Office of Scholarly Communication’s new Training Coordinator, recently attended the inaugural Scholarly Communication Conference at the University of Kent. In this post she reviews the main themes and discussions from the event.

ScholComm19 – a brand new conference, a supportive community, an inclusive space: what a treat for a newcomer to scholarly communication! Having recently started a job within the Office of Scholarly Communication, I had high expectations for this conference as an opportunity to learn a lot from fellow practitioners, and I was not disappointed. Sarah Slowe and the team at the University of Kent should be congratulated for their drive in starting up a new gathering that draws together all the different strands of Scholarly Communications, giving those working at the coalface a chance to get together and share best practice.

With the whole of Friday given over to lightning talks, there were too many speakers for me to do them justice individually, so instead I will attempt to summarise the major themes, as I understood them. The full conference programme can be found here.

Engaging researchers

Many of the speakers focused on the way we work with researchers. Hardly surprising, perhaps, as our jobs tend to involve as much advocacy and training as they do practical support. While at times this is a challenge, many have found ways to deliver our messages more effectively:

 

  • A personal touch – Cassie Bowman from London South Bank University was faced with a lack of researcher engagement, due to the limitations of the technological platform, the complex terminology, the conflicting demands of policies, the difficulties in correcting initial misunderstandings, and the researchers’ fear of getting it wrong. She overcame these not by commissioning large scale change, but through her own personal touch. Her one-to-one sessions are carefully tailored to each researcher and produce long-lasting changes in attitudes. She reaches people through posters and infographics, sprinkling on a little competition (for the highest download figures) to boost interest. Lucy Lambe also spoke on the benefits of one-to-one sessions, alongside workshops and advice on the web, for her publishing advice service for researchers at LSE.
  • A bit of fun – The Publishing Trap game is now well-known in ScholComm circles, but it was new to me, and I was blown away. It takes players through a cleverly-crafted path from PhD student to retired researcher and beyond – all the way to gravestone, in fact – replicating the emotional highs and lows of a research career. Most importantly, though, it asks players to make crucial decisions that spark discussions on Open Access, copyright, skills, and more. Why not organise a fun session to surprise those who may (crazily!) believe that copyright is boring?
  • Useful information – We need to deliver information that is trustworthy and useful. Kirsty Wallis (University of Greenwich) stressed the importance of over-preparing and tailoring sessions to the needs of the people in the room. Her talk gave a useful blueprint of how we could teach academics to ‘speak social media’ through a flexible and hands-on workshop. ‘We need to be a credible source of information’ – this was one aspect of Julie Baldwin’s (University of Nottingham) exploration of why academics ‘get copyright so copywrong’. Engaging researchers with copyright issues is more important than ever now, at a time of change in the law. The University of Kent’s Chris Morrison gave a whistle-stop tour of the history of copyright law, followed by a sneak preview of the way the law may change once the new EU directive is implemented (yes, Brexit did flash briefly on the screen at this point, but it should not have a significant impact on copyright decisions).

Compliance vs culture change

Ian Carter’s talk on the study he ran with JISC on Research Data Management and Sharing raised a strong theme, which was echoed in many of the discussions I had during breaks. His interviews with representatives from 34 institutions revealed that there is a tension in the way we attempt to engage researchers with RDM and open data: on the one hand we say ‘you must do this to receive money/progression/recognition’, on the other we say ‘doing this benefits science and the wider world’. My belief is that the former is likely to generate small, short term wins on compliance rates, but potentially generate resentment. The latter requires more advocacy, but it is likely to generate true buy-in from researchers. Dr Carter advocates that the second approach, which aims for culture change, is indeed the most likely to succeed in the long term. He throws a challenge to all of us when he reports that researcher engagement is variable, RDM leadership is often fragile, responsible staff can be isolated, and few institutions consider all important aspects in their strategies. There is hope, however. As repositories develop better functionality and we find better ways to evidence the benefits of RDM and open data, we may see this area of research support grow into new strengths.

Infrastructural headaches

Repositories are the bread-and-butter of any Open Access support team: they are wonderful digital treasure troves, opening up our university’s invaluable research to the world and preserving it in perpetuity… but at times they can cause tremendous headaches too! A number of speakers shared the challenges they faced, as well as their solutions, saving the rest of us a lot of time and paracetamol. While there is still a split between institutions on the issue of whether depositing in a repository is done by researchers or mediated by support staff, it looked to me as though the trend is towards self-deposit by academics, which will mean more and more of us require automated systems for checking and updating records.

  • Nicola Barnett focused on how staff at the University of Leeds deal with the need to update repository records after they are officially published, for instance to set the correct embargo deadlines. She shared a useful set of instructions to automatically generate a list of recently published publications using Excel and a CrossRef API.
  • The diversity of publishers’ policies was arguably the greatest time-consuming hurdle in Suzanne Atkins’ work on making more monographs Open Access at the University of Birmingham. She ran a very successful pilot project to open up book chapters from one department, which had a glut of materials that could be made instantly OA, if the authors consented. While this work was very worthwhile and likely puts the team ahead when it comes to the next REF, it was hindered by the need to check every single policy and by the publishers’ insistence on relying on case-by-case decision, rather than applying blanket policies.
  • If your current system is just not up to requirements, switching to a new one can be a good time investment in the long run, but it can come with its own demands. Catherine Parker and her team at the University of Huddersfield found this out when they had to manually migrate all previous records – a great feat that really brought out their community spirit and was accomplished in (only?) two and a half months of intensive work. Stuart Bentley from the University of Hull highlighted some of the challenges of switching to Worktribe, as well as considering the improved functionality in the new system.

Roles and time

Finally, several speakers examined the way teams are structured, often in the context of the age-old question of how to get it all done in the time we have.

  • Surveys run by Catherine Parker and Ian Carter revealed a great disparity in the size of the research support and data management staff between institutions, with teams varying in size from one to well over a dozen. Even the areas where they are employed vary, with most being in libraries, but some belonging to research strategy offices. Lone workers have the blessing and the curse of having to take on all aspects of the work, from maintaining the repository to liaising with faculty members and running training, while large teams can specialise their staff.
  • Jane Belger and Anne Lawson talked about their experience of sharing the role of Research and Open Access Librarian at the University of West England at Bristol. Having worked out the logistics of syncing schedules and the questions of when to divide up projects and when to collaborate, their main conclusion is that two people can be ‘more than the sum of their parts’.
  • The multiplicity of roles was evident both in the talks and in the chats during breaks. Almost every speaker gave an introduction to their institution, which was key to understanding their perspective. A case in point was from Isabel Benton, from Leeds Arts University. She highlighted the peculiar challenges of working at a place where as many as 43% of outputs are in non-traditional format such as art show or exhibition: how do you capture those in a repository? (Hint: with a creative mix of media, check out the repository to know more.

*****

There was lots to think about on the train home. The overwhelming feeling, though, was of a community that genuinely cares about doing our very best to support researchers, and is dedicated to helping each other, both within institutions and beyond.

Published 30 May 2019
Written by Dr Beatrice Gini
Creative Commons License

A Fast-Track Route to Open Access

In the last two years, since the REF 2021 open access policy came into force, the Open Access Team has received an ever increasing number of manuscript submissions for archiving in Apollo, Cambridge’s institutional open access repository.

We have been thinking long and hard about ways to cope with the workload, by scrutinising existing practices and streamlining workflows, because we want to provide the best possible service to our researchers, commensurate with the University’s world leading research.

This blog introduces what is perhaps the greatest overhaul of our workflows since the service began: a new ‘Fast Track’ deposit system.

Work it harder

Before the start of the REF OA policy (2014-2016), the Open Access Team would process and manually curate every manuscript submission we received. Authors could expect an initial response within 1-2 working days, after which (usually within a month) we would archive their manuscript in Apollo.

A simplified workflow for a typical manuscript was:

  1. Manuscript uploaded by submitter in Symplectic Elements.
  2. Item created in Apollo (DSpace) workflow
  3. Helpdesk ticket created (Zendesk).
  4. Open Access Team reviews manuscript, advises submitter and makes a decision.
  5. Open Access Team archives the manuscript in Apollo and informs submitter.

Both the decision (4) and archive (5) steps take time. For each manuscript we would need to decide whether the files we received could be archived, what funder open access policies were at play and the open access options available from the publisher. We could then advise authors about their open access choices.

To archive a manuscript the process was broadly the following:

  1. Review the helpdesk ticket (Zendesk) for the open access decision.
  2. Enter as many publication details as possible in Symplectic Elements.
  3. Retrieve the submission from the Apollo (DSpace) deposit workflow.
  4. Add licence and metadata to the record.
  5. Review the submission and approve for archiving.
  6. Move the item to the relevant departmental collection and apply an appropriate embargo (if required).
  7. Finally, update the helpdesk ticket and send the original submitter a link to their Apollo record.

Each manuscript took on average 18 minutes to archive, which, besides being manually tedious and prone to error, was extremely time-consuming. Add to this the time required to make the initial decision and each manuscript submission could easily take 30 minutes for the Open Access Team to fully process from start to finish, especially if an open access fee had to be paid.

Fast-forward two years and with the rate of new manuscript submissions now peaking at over 1,300 per month, simply processing manuscripts for the REF would require more than four full-time staff members. Whilst these manual processes were viable for a handful of submissions a day, they became unwieldy at scale.

Make it better

Our first attempt at speeding up our open access system began in August 2017. To start we made a number of operational changes to reduce the time spent processing manuscript submissions:

  • We would rely entirely on the metadata present in Symplectic Elements to populate the Apollo records (i.e. we would not curate manual records).
  • The Open Access Team would no longer update the helpdesk records, instead internal record keeping would be automated as much as possible.

Unfortunately, the number of steps in the Apollo workflow was still roughly the same as the previous process, but with one key difference: a new field to record what we call the ‘Fast Track’ decision. There were seven Fast Track options:

  • Submitted
  • Proof
  • Published (not open access)
  • Published (open access)
  • Accepted (published)
  • Accepted (not published)
  • Other

The first six options represent the vast bulk of all manuscripts received by the Open Access Team, and ‘Other’ option simply acts as a catch-all for anything else. By simply knowing what sort of manuscript has been uploaded much of the decision and archiving process can be automated. However, the agent still needed to retrieve the item from the Apollo workflow, check the version of the file and publication status of the paper, add some metadata fields, approve the item, and move it to an appropriate collection.

Figure 1. The Apollo workflow page of a typical manuscript submission, with the addition of the new ‘Fast Track’ field.

The choice of Fast Track decision leads to four possible outcomes which would ‘trigger’ actions in our Zendesk helpdesk:

  • Submitted, proof, published (not open access)
    • Email submitter, ask for accepted manuscript
  • Published (open access)
    • Archive in Apollo (no embargo) ⇒ Email submitter Apollo link
  • Accepted (published), accepted (not published)
    • Archive in Apollo (embargoed) ⇒ Email submitter Apollo link
  • Other
    • Refer to Open Access Team

Despite being a much faster process, it was still manually tedious. It could also require up to 33 actions from agents (29 mouse clicks) and 14 web pages to be loaded, still not very user friendly. However, the time to archive had decreased from 18 to 9 minutes – a 50% reduction from the previous fully manual system.

Do it faster

So what if all the steps involved in processing a manuscript submission could be reduced to the absolute minimum, and be actionable within a single webpage? After a short development sprint, the Open Access Team launched the ‘Fast Track Deposits’ interface last September. A snapshot of the user interface is shown below.

Figure 2. The Fast Track interface. Choosing one of the options in blue is enough to fully archive a manuscript, or process it for further action by the submitter or the Open Access Team.

At the top of the page, the agent can see a ‘publication summary’ including the item title, the journal title, and publisher DOI if available. Both the item title and publisher DOI are hyperlinked, so that the agent can Google-search the item or land on the publisher’s webpage with a single mouse click.

The agent must first inspect the file and check that it is a suitable version (i.e. either the accepted version or the open access published version). If wrongly labelled, they must relabel the file via a dropdown menu, and add/delete files as appropriate. The agent then ‘describes’ the manuscript (i.e. decides whether it is the accepted, published, submitted or proof version) and submits their decision. The decision determines the trigger behaviour in the automatically populated helpdesk ticket. The agent is then free to move on to the next item.

If the decision is ‘accepted’ or ‘published open access’, the item is deposited and the submitter is automatically notified via email. For submitted, proof, and non-OA published versions, the author receives an automatic email asking for the accepted manuscript. Items are archived in the repository under a generic collection, and any forthcoming publication details are added to the record via external source information in Elements.

To see just how efficient Fast Track is we’ve prepared a short demonstration video which captures some of the key features:

Video 1. Real-time demonstration of the Fast Track system.

Makes us stronger

Agents therefore need only make one decision: identify the file version. But the real ingenuity of the Fast Track system is that embargoes can be set automatically by:

  1. Taking into account the decision made by the agent (e.g. no embargo if published open access);
  2. Detecting publication status and publication dates from Elements; and
  3. Retrieving journals’ embargo policies via Orpheus (you can learn more about Orpheus in our previous blog post).

In some cases, usually because we don’t know the publication date, we can’t determine the embargo length of an accepted manuscript. In such cases we apply a 36 month embargo from the date of the Fast Track decision. We know that this embargo won’t always be correct, however, we routinely check manuscripts in Apollo and update embargoes accordingly.

Figure 3. Simplified overview of the Fast Track process. The key decision is to determine the type of manuscript that has been submitted. Everything else is handled automatically.

Since launching Fast Track the average time to process a manuscript is 1-2 minutes. More than 8,000 items have been processed since launching the phase two Fast-Track interface. If items processed under the phase one effort are included, the number goes up to just over 14,000. And since a picture speaks a thousand words, Figure 4 below shows the effect produced by the new interface launched in September on our backlog of unprocessed submissions.

Figure 4. Historical change in the number of unprocessed open access manuscript submissions. The total number of outstanding manuscript submissions peaked at nearly 2,400 in September 2018. Immediately after launching the Fast Track website the backlog dropped dramatically and was completely eliminated by March 2019.

We will continue to develop Fast Track to further streamline our processing of manuscripts. We have already started to partner with librarians and administrators across the University to leverage the collective knowledge about open access which now exists within the University’s professional academic services.

Get in contact: If you are running a DSpace repository and would like to implement Fast Track to work alongside your existing workflows email us at support@repository.cam.ac.uk

Published 23 April 2019
Written by Dr Mélodie Garnier and Dr Arthur Smith
Creative Commons License