Monthly Archives: August 2017

What I wish I’d known at the start – setting up an RDM service

August 24, 2017Uncategorizedfunders, open access, Policies, RDM, research data management, TrainingOffice of Scholarly Communication

In August, Dr Marta Teperek began her new role at Delft University in the Netherlands. In her usual style of doing things properly and thoroughly, she has contributed this blog reflecting on the lessons learned in the process of setting up Cambridge University’s highly successful Research Data Facility.

On 27-28 June 2017 I attended the Jisc’s Research Data Network meeting at the University of York. I was one of several people invited to talk about experiences of setting up RDM services in a workshop organised by Stephen Grace from London South Bank University and Sarah Jones from the Digital Curation Centre. The purpose of the workshop was to share lessons learned and help those that were just starting to set up research data services within their institutions. Each of the presenters prepared three slides: 1. What went well, 2. What didn’t go so well, 3. What they would do differently. All slides from the session are now publicly available.

For me the session was extremely useful not only because of the exchange of practices and learning opportunity, but also because the whole exercise prompted me to critically reflect on Cambridge Research Data Management (RDM) services. This blog post is a recollection of my thoughts on what went well, what didn’t go so well and what could have been done differently, as inspired by the original workshop’s questions.

What went well

RDM services at Cambridge started in January 2015 – quite late compared to other UK institutions. The late start meant however that we were able to learn from others and to avoid some common mistakes when developing our RDM support. The Jisc’s Research Data Management mailing list was particularly helpful, as it is a place used by professionals working with research data to look for help, ask questions, share reflections and advice. In addition, Research Data Management Fora organised by the Digital Curation Centre proved to be not only an excellent vehicle for knowledge and good practice exchange, but also for building networks with colleagues in similar roles. In addition, Cambridge also joined the Jisc Research Data Shared Service (RDSS) pilot, which aimed to create a joint research repository and related infrastructure. Being part of the RDSS pilot not only helped us to further engage with the community, but also allowed us to better understand the RDM needs at the University of Cambridge by undertaking the Data Asset Framework exercise.

In exchange for all the useful advice received from others, we aimed to be transparent about our work as well. We therefore regularly published blog posts about research data management at Cambridge on the Unlocking Research blog. There were several additional advantages of the transparent approach: it allowed us to reflect on our activities, it provided an archival record of what was done and rationale for this and it also facilitated more networking and comments exchange with the wider RDM community.

Engaging Cambridge community with RDM

Our initial attempts to engage research community at Cambridge with RDM was compliance based: we were telling our researchers that they must manage and share their research data because this was what their funders require. Unsurprisingly however, this approach was rather unsuccessful – researchers were not prepared to devote time to RDM if they did not see the benefits of doing so. We therefore quickly revised the approach and changed the focus of our outreach to (selfish) benefits of good data management and of effective data sharing. This allowed us to build an engaged RDM community, in particular among early career researchers. As a result, we were able to launch two dedicated programmes, further strengthening our community involvement in RDM: the Data Champions programme and also the Open Research Pilot Project. Data Champions are (mostly) researchers, who volunteered their time to act as local experts on research data management and sharing to provide advice and specialised training within their departments.The Open Research Pilot Project is looking at the benefits and barriers to conducting Open Research.

In addition, ensuring that the wide range of stakeholders from across the University were part of the RDM Project Group and had an oversight of development and delivery of RDM services, allowed us to develop our services quite quickly. As a result, services developed were endorsed by wide range of stakeholders at Cambridge and they were also developed in a relatively coherent fashion. As an example, effective collaboration between the Office of Scholarly Communication, the Library, the Research Office and the University Information Services allowed integration between the Cambridge research repository, Apollo, and the research information system, Symplectic Elements.

What didn’t go so well

One of the aspects of our RDM service development that did not go so well was the business case development. We started developing the RDM business case in early 2015. The business case went through numerous iterations, and at the time of writing of this blog post (August 2017), financial sustainability for the RDM services has not yet been achieved.

One of the strongest factors which contributed to the lack of success in business case development was insufficient engagement of senior leadership with RDM. We have invested a substantial amount of time and effort in engaging researchers with RDM and by moving away from compliance arguments, to the extent that we seem to have forgotten that compliance- and research integrity-based advocacy is necessary to ensure the buy in of senior leadership.

In addition, while trying to move quickly with service development, and at the same time trying to gain trust and engagement in RDM service development from the various stakeholder groups at Cambridge, we ended up taking part in various projects and undertakings, which were sometimes loosely connected to RDM. As a result, some of the activities lacked strategic focus and a lot of time was needed to re-define what the RDM service is and what it is not in order to ensure that expectations of the various stakeholders groups could be properly managed.

What could have been done differently

There are a number of things which could have been done differently and more effectively. Firstly, and to address the main problem of insufficient engagement with senior leadership, one could have introduced dedicated, short sessions for principal investigators on ensuring effective research data management and research reproducibility across their research teams. Senior researchers are ultimately those who make decisions at research-intensive institutions, and therefore their buy-in and their awareness of the value of good RDM practice is necessary for achieving financial sustainability of RDM services.

In addition, it would have been valuable to set aside time for strategic thinking and for defining (and re-defining, as necessary) the scope of RDM services. This is also related to the overall branding of the service. In Cambridge a lot of initial harm was done due to negative association between Open Access to publications and RDM. Due to overarching funders’ and government’s requirements for Open Access to publications, many researchers started perceiving Open Access to publications merely as a necessary compliance condition. The advocacy for RDM at Cambridge started as ‘Open Data’ requirements, which led many researchers to believe that RDM is yet another requirement to comply with and that it was only about open sharing of research data. It took us a long time to change the messages and to rebrand the service as one supporting researchers in their day to day research practice and that proper management of research data leads to efficiency savings. Finally, only research data which are management properly from the very start of the research process can be then easily shared at the end of the project.

Finally, and which is also related to the focusing and defining of the service, it would have been useful to decide on a benchmarking strategy from the very beginning of the service creation. What is the goal(s) of the service? Is it to increase the number of shared datasets? Is it to improve day to day data management practice? Is to to ensure that researchers know how to use novel tools for data analysis? And, once the goal(s) is decided, design a strategy to benchmark the progress towards achieving this goal(s). Otherwise it can be challenging to decide which projects and undertakings are worth continuation and which ones are less successful and should be revised or discontinued. In order to address one aspect of benchmarking, Cambridge led the creation of an international group aiming to develop a benchmarking strategy for RDM training programmes, which aims to create tools for improving RDM training provision.

Final reflections

My final reflection is to re-iterate that the questions asked of me by the workshop leaders at the Jisc RDN meeting really inspired me to think more holistically about the work done towards development of RDM services at Cambridge. Looking forward I think asking oneself the very same three questions: what went well, what did not go so well and what you would do differently, might become for a useful regular exercise ensuring that RDM service development is well balanced and on track towards its intended goals.

Published 24 August 2017
Written by Dr Marta Teperek

Summer camp – the scholarly communication way

August 22, 2017UncategorizedFORCE11, FSCI, governance, open access, risk management, scholarly communication, stakeholdersOffice of Scholarly Communication

Growing up, a diet of B-grade movies gave the impression of American summer camps as places where teenagers undertake a series of slapstick events in the wilderness. That may indeed be the case sometimes, but at the University of California San Diego campus recently, a group of decidedly older people bunked in together for a completely different type of summer camp.

The inaugural FORCE11 Scholarly Communications Institute (FSCI) was held in the first week of August, bringing together librarians, researchers and administrators from around the world. The event was planned as a week long intensive summer school on improving research communication. The activities were spread all over the campus, although not, unfortunately in the mother of all spaceships for a library.

The event hashtag was #FSCI and the specific hashtag for the course, “Building an Open and Information Rich Institution” I ran with Sarah Shreeves from University of Miami was #FSCIAM3. This blog is a brief run down of what we covered in the course.

Our course

We had a wonderful group of people, primarily from the library sector, and from around the world (although many were working in American universities).

From the delivery perspective this was an intense experience requiring 14 hours of delivery plus the documentation and follow up each day. It was further complicated by the fact that Sarah and I met for first time in person half an hour before delivery on the Monday.

Working within open and F.A.I.R principles, we have made all of our resources and information available and links to all the Google documents are included in this blog post.The shared Google Drive has links to everything. These presentations will be uploaded to the FSCI Zenodo site when it is available. In addition the group created a Zotero page which collects together relevant links and resources as they arose in discussion.

Monday – Problem definition

Using an established process the group worked together to define the problems we were looking to address in scholarly communication:

OA takes time and money – and the tools are annoying.
We need to reduce complexity – make it easy administratively
It is important to recognise difference – one size does not fit all, there are cultural and country norms in publishing and prestige
Motivation – what are the incentives? How can we demonstrate benefit?
There is a need for advocacy and training of various stakeholders including within library
We can demonstrate the repository as a free way of publishing with impact tracking – for both the author and the institution.
Whose responsibility is this?

The slides from the first day (including the workings of the group) are available.

Tuesday – Stakeholder mapping

On the second day we discussed the different stakeholders in institutions and external to institutions in this space. Each table created a pile of post it notes which were then classified on a large grid on the wall against ‘interest’ versus ‘influence’. We then discussed which stakeholders we needed to work with, and whether it is possible to move the stakeholder from one of the quadrants into another. We also discussed the value in using some stakeholders to reach others.

A second exercise we ran was ‘responding to objections’ – where we gave the group a few minutes to create objections that different stakeholders may have to aspects of scholarly communication. These were then randomised and the group had only a couple of minutes to develop an ‘elevator pitch’ to respond to that objection. The slides from day 2 incorporate the comments, objections and counter arguments.

Wednesday – Communication

We started the day with a ‘gathering evidence’ exercise that consisted of a series of questions that were allocated to each table to discuss with a view to the kind of information held in an institution that might be helpful to answer it. Examples of the type of questions we asked the group to consider are: How do we better understand and communicate with the range of disciplines on campus? (with a goal of creating advocacy materials that support the range of disciplinary needs of the institution) or Who is doing collaborative research with others on campus and with others outside of the university? Is there interdisciplinary research? (with a goal of creating a map of collaborations on campus).

We moved to an exercise to demonstrate the need for clear communication. People worked in pairs and had a pile of building bricks which they were asked to build a shape from. They then had five minutes to describe their shape. After this the instructions were swapped and the opposite pair tried to reproduce the shape from the instructions.The results were surprising – with fewer than 50% of shapes reproduced. However, looking at some of the instructions, things became clearer. Note the description ‘cute kitty’ in these instructions.

The final session on day three was a risk assessment exercise where we put up the proposal ‘that we will make all digitised older theses open access without obtaining permission from the authors’. The tables were asked to come up with potential risks that could arise from this proposal, and then asked to map these onto a grid that considered the likelihood and severity of each risk.

Then the group discussed what could be done to mitigate the risks they identified, and then determine if the risk could then be moved within the grid. Again, all discussions are captured in the slides.

Thursday – Governance

On the Thursday we considered matters of governance. Dominic Tate from Edinburgh talked the group through the management structure at his institution, and how they have managed to create a strong decision making governance.

Using a system of mapping organisational structure to the decision structure, the group identified a goal they would like to achieve at their workplace and then to consider the aspects that are Strategic, Tactical and Operational. They then identified the person/people/group that will need to agree at each of these stages to achieve the end goal,and whether this was something that could be managed within the immediate organisation or does it involve the wider institution. We also discussed whether policies would need to be changed or created, and the level of consultation needed. The slides describe the process.

At the end of this day we broke into two groups for an unconference. One group discussed the UK Scholarly Communication Licence, the other continued on the governance discussion by identifying stakeholders and working out how to approach them.

Friday – The future

On the last day we discussed the best way to share stories with the relevant stakeholders – what is the best way to present the information? How do you get it to the person?

We then looked to the future, first by considering big disruptor technologies on the last 20 years. We asked people to share their work experiences before these technologies existed, to give us an idea of how much things will potentially change into the future with the next big disruptors. We then asked individuals to identify future issues that they will need to address at their institution, which they then sorted at the table level before we did a group consolidation to identify what the issues will be.

Each group chose one of these issues/futures, and in a mini overview of the work we had done throughout the week, they undertook a stakeholder assessment – who would they have to engage to make this happen? They also identified the governance structures in place, and the type of information they would want in place to make decisions about moving in this area. Sone of the discussions are captured in the slides.

Assessment of the course

When developing the course we articulated what we hoped the participants would get out of the week. These included the ability to:

Think strategically and comprehensively about openness and their institution
Articulate the ‘why’ of openness for a variety of stakeholders within an institution
Articulate how information related to research and outputs flows through an institution and understand challenges to this flow of information
Understand the practicalities of delivering open access to research outputs and research data management within an institution
Consider the technology, expertise, and resources required to support open research

So how did we do? Well according to the feedback at the beginning and end of the week we certainly hit all the targets the participants identified.

The responses at the beginning of the week were:

And the feedback at the end of the week was:

Interestingly, the Governance session was the least popular session we ran, but it rated extremely highly in the areas the participants self identified as learning about.

Several people went out of their way to tell Sarah and I that ‘this was the best training/workshop I have ever done’ which is very high praise.

On the Friday afternoon all of the participants for FSCI got back together to provide feedback about what happened in their courses. These ranged from an explanation of what people did, to participants describing what they knew, to poems. There was no expressionist dance unfortunately (perhaps next year). Sarah and I chose to describe our week in pictures.

Wrap of the week

While it was slightly disorienting spending a week in student accommodation, overall this was a valuable and rewarding experience – if extremely intense. Our group of just over 120 people was only one of several ‘camps’ happening at the same time, including electronic music and programming groups. We all converged on the dining hall each meal, a big hodge podge of people.

The largest and most intrusive group was the teenagers at the San Diego District Police camp. This is a para military organisation, we discovered. This did go some way to explain the line ups at 6am and also at 9pm, the groups shouting their responses in unison, and the instructors wandering around with guns on their hips.

On a much more peaceful note, San Diego is where Dr Seuss lived, and looking at the vegetation and landscape it is easy to see where his inspiration originated.

Published 22 August 2017
Written by Dr Danny Kingsley

Next steps for Text & Data Mining

August 17, 2017UncategorizedChemDataExtractor, ContentMine, FutureTDM Project, NaCTeM, PLOS, publishers, TDM, text and data mining, WikimediaOffice of Scholarly Communication

Sometimes the best way to find a solution is to just get the different stakeholders talking to each other – and this what happened at a recent Text and Data Mining symposium held in the Engineering Department at Cambridge.

The attendees were primarily postgraduate students and early career researchers, but senior researchers, administrative staff, librarians and publishers were also represented in the audience.

Background

This symposium grew out of a discussion held earlier this year at Cambridge to consider the issue of TDM and what a TDM library service might look like at Cambridge. The general outcome of that meeting of library staff was that people wanted to know more. Librarians at Cambridge have developed a Text and Data Mining libguide to assist.

So this year the OSC has been doing some work around TDM, including running a workshop at Research Libraries UK annual conference in March. This was a discussion about developing a research library position statement on Text and Data Mining in the UK. The slides from that event are available and we published a blog post about the discussion.

We have also had discussions with different groups about this issue including the Future TDM project which has been looking to increase the amount of TDM happening across Europe. This project is now finishing up. The impression we have around the sector is that ‘everyone wants to know what everyone else is doing’.

Symposium structure

With this general level of understanding of TDM as our base point, we structured the day to provide as much information as possible to the attendees. The Twitter hashtag for the event is #osctdm, and the presentations from the event are online.

The keynote presentation was by Kiera McNeice, from the FutureTDM Project who have an overview of what TDM is, how it can be achieved and what the barriers are. There is a video of her presentation (note there were some audio issues in the beginning of the recording).

The event broke into two parallel sessions after this. The main room was treated to a presentation about Wikimedia from Cambridge’s Wikimedian in Residence, Charles Matthews. Then Alison O’Mara-Eves discussed Managing the ‘information deluge’: How text mining and machine learning are changing systematic review methods. A video of Alison’s presentation is available.

In the breakout room, Dr Ben Outhwaite discussed Marriage, cheese and pirates: Text-mining the Cairo Genizah before Peter Murray Rust spoke about ContentMine: mining the scientific literature.

After lunch, Rosemary Dickin from PLOS talked about Facilitating Test and Data Mining how an open access publisher supports TDM. PhD candidate Callum Court presented ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature. This presentation was filmed.

In the breakout room, a discussion about how librarians support TDM was led by Yvonne Nobis and Georgina Cronin. In addition there was a presentation from John McNaught – the Deputy Director of the National Centre for Text and Data Mining (NaCTeM), who presented Text mining: The view from NaCTeM .

Round table discussion

The day concluded with the group reconvening together for a roundtable (which was filmed) to discuss the broader issue of why there is not more TDM happening in the UK.

We kicked off by asking each of the people who had presented during the event to describe what they saw as the major barrier for TDM. The answers ranged from the issue of recruiting and training staff to the legal challenges and policies needed at institutional level to support TDM and the failure of institutions and government to show leadership on the issue. We then opened up the floor to the discussion.

A librarian described what happens when a publisher cuts off access, including the process the library has to go through with various areas of the University to reinstate access. (Note this was the reason why the RLUK workshop concluded with the refrain: ‘Don’t cut us off!’). There was some surprise in the group that this process was so convoluted.

However, the suggestion that researchers let the library know that they want to do TDM and the library will organise permissions was rejected by the group, on both the grounds that it is impractical for researchers to do this, and that the effort associated with obtaining permission would take too long.

A representative from Taylor and Francis suggested that researchers contact the publishers directly and let them know. Again this was rejected as ‘totally impractical’ because of the assumption this made about the nature of research. Far from being a linear and planned activity, it is iterative and to request access for a period of three months and to then have to go back to extend this permission if the work took an unexpected turn would be impractical, particularly across multiple publishers.

One attendee in her blog about the event noted: “The naivety of the publisher, concerning research methodology, in this instance was actually quite staggering and one hopes that this publisher standpoint isn’t repeated across the board.”

Some researchers described the threats they had received from publishers about downloading material. There was anger about the inherent message that the researcher had done something criminal.

There was also some concern raised that TDM will drive price increases as publishers see ‘extra value’ to be extracted from their resources. This sparked off a discussion about how people will experiment if anything is made digitally available.

During the hour long session the conversation moved from high level problems to workflows. How do we actually do this? As is the way with these types of events, it was really only in the last 10 minutes that the real issues emerged. What was clear was something I have repeatedly observed over the past few years – that the players in this space including librarians, researchers and publishers, have very little idea of how the others work and their needs. I have actually heard people say: ‘If only they understood…’

Perhaps it is time we started having more open conversations?

Next steps

Two things have come out of this event. The first is that people have very much asked for some hands on sessions. We will have to look at how we will deliver this, as it is likely to be quite discipline specific.

The second is there is clearly a very real need for publishers, researches and librarians to get into a room together to discuss the practicalities of how we move forward in TDM. One of the comments on Twitter was that we need to have legal expertise in the room for this discussion. We will start planning this ‘stakeholder’ event after the summer break.

Feedback

The items that people identified as the ‘one most important thing’ they learnt was instructive. The answers reflect how unaware people are of the tools and services available, and of how access to information works. Many of the responses listed specific tools or services they had found out about, others commented on the opportunities for TDM.

There were many comments about publishers, both the bad:

Just how much impact the chilling effect of being cut off by publishers has on researchers
That researchers have received threats from publishers
Very interesting about publishers and ways of working with them to ensure not cut off
Lots can be done but it is being hindered by publishers

and the good:

That PLOS is an open access journal
That there are reasonable publishing companies in the UK
That journals make available big data for meta analysis

Commentary about the event

There has been some online discussion and blog posts on the event:

Georgina Cronin’s blog post on the talk prepared by Georgina and her colleague Yvonne Nobis: How librarians support TDM in the Research Environment.
The libraranerrant wrote Text and Data Mining Symposium
Text + Data Mining: A Next-Gen Library Service? by Moorpheus
Online notes made by attendee Laurence Horton
Peter Murray Rust has written several blog posts on Text and Data Mining and created a poster.

Published 17 August 2017
Written by Dr Danny Kingsley