Tag Archives: Open Research

Cambridge Data Champions – reflections on an expanding community and strategies for 2019

The Cambridge Data Champions (DCs) advocate good Research Data Management (RDM) and Open Data practices to researchers locally in their departments, within Cambridge University in general, and sometimes further afield. They network with one another, exchange good methods of RDM, share ideas and, as a collective, reflect on current issues surrounding RDM, Open Data and researcher engagement, where a major shared goal is to establish best practices when it comes to research data. By attending bi-monthly forums facilitated by the Research Data Team, the DCs convene as a community, hear speakers presenting on relevant topics, and engage in workshops that will help them in their ‘championing’ activities. Following up from our latest blog which summarised how a workshop led to the creation of cartoon postcards as a new tool to add to the DCs’ resource kit for RDM advocacy, we are now reflecting on initiatives that sprung from workshops during the past year and are considering the challenges and opportunities that this programme brings as it approaches the end of its third year. 

Growing 

The programme started in Autumn 2016, comprising researchers who volunteered to become local community experts and advocate on research data management and sharing. Our first call welcomed 43 DCs (September 2016), our second call 20 DCs (March 2018) and the third call 40 DCs (January 2019). For simplicity, this year we also added to our statistics the “affiliate” DCs, who are colleagues who contribute to the DC community in other ways (as interested members of Cambridge’s RDM Project Group) and not necessarily through channelling their RDM efforts for the benefit of a specific department.

We are now a community comprised of 87 active DCs. 

Graph showing number of Data Champions (current and alumni) per year between 2016 and 2019.
Total number of Data Champions who joined in each year (orange column indicates Champions who are still active; blue column indicates Champions who are now alumni).

Communities within a community 

Over the last year we caught ourselves using words such as the ‘old DCs’ and the ‘new DCs’ and what we really meant was ‘established DCs’ and ‘new DCs’, with the latter group being those joining the programme each year. In September we celebrate the programme’s third birthday and it is reasonable to expect that there will be more experienced DCs who have already built their networks and have, more or less, a stable offering of RDM support and an enhanced understanding of the needs of their department. On the other hand, there are those who are being welcomed into the group who seek, to differing degrees, initial support from both the RDM team and their fellow colleagues in order to become successful DCs. It is easy to imagine that different layers are being developed with different needs, both in terms of support and engagement.  

Through various activities and feedback from DCs, we now have a good quantity of raw data to analyse their needs for being, as we called it, ‘a good Data Champion’. We have brainstormed ideas which we are putting into action to respond to the challenges of an ever-growing Data Champions group. 

Planning  

DC Welcome Pack 

Word cloud image of "welcome" in different languages  - front page of the Data Champion Welcome pack.

Every year we circulate the Data Champions Welcome Pack to coincide with the inductions we organise to welcome new DCs into the group. This year we included in the pack what it is expected from a DC when s/he joins the programme so that expectations are clearly communicated from the beginning and are the same for everybody. 

Document describing what Data Champions are expected to do as part of the Programme.
Page from the Cambridge Data Champions Welcome Pack

Bi-monthly forums 

Lightning talks have been introduced as a standard item in each forum. These have provided DCs with the opportunity to discuss aspects of RDM they are working on (e.g. new tools and techniques), or to feed back to the group on DC activities undertaken in their departments and data-related events they have attended so that the whole group can benefit. Importantly, the lightning talks have been used by DCs to problem solve, where the collective knowledge and experience of DCs attending a forum has been harnessed to address particular challenges faced by individual DCs. This is where the community aspect of the programme truly shines. 

It is always a priority for us to invite speakers to forums who are external to the programme, reflecting the needs of both the new and established DCs. For example, Hannah Clements from Cambridge University’s Researcher Development Programme (RDP) spoke to the DCs at the January forum about mentoring, providing guidance on how support can be best delivered within the DC community. In the May forum, we had talks and discussions from a panel of experts working on different aspects of data archiving. The panellists came from across the University bringing a diversity of experience, grounded in clinical governance, computing, and more traditional archiving. These examples are just a couple of the themes that we have covered so far in the forums, which have been derived predominantly from information provided by (and the needs of) the DCs themselves. Additional topics that we plan to cover in future forums include issues surrounding reproducibility, IP and commercialisation, publishing and the impact of research data.  

Key aims of these forums are to not only facilitate networking between DCs but to also act as an arena for the transfer of knowledge along the ‘researcher pipeline’, from forum to DCs and from DCs to researchers in their departments.   

DC specialisation group 

As a community, we need to be able to map expertise internally and understand the make-up of such an organic group at any given moment. This makes it is easier to support each other and create collaborations, but also improves how we promote the programme externally.

Table showing specialisation categories and sub-categories for Data Champions
Areas of expertise amongst our Data Champions

This led to the formation of the DC specialisation group, consisting of one of us and six of the DCs, which determined how to categorise expertise within the group. As a result, a spreadsheet was created where all DCs can chart their specialist areas and update or amend when necessary (and at least annually). We have top level categories for simple statistical analysis and second level categories that offer more specific details for the benefit of the DC community. 

The next stage is to include the wider research community and improve how various stakeholders can reach the appropriate Data Champions for initial advice and support in RDM issues. One way to do this is by presenting more coherent and consistent specialisations on the Data Champions’ website, using the categories which we have already created for internal use within the group. This stage is due to begin this month and we hope to report on our efforts next year.  

Branding group 

A growing community is inevitably going to bring to the forefront various identity discussions. With this in mind, we formed a branding group to examine if a DC logo should be created to enhance the Data Champions’ visibility and raise their profile amongst their peers when advocating for RDM. A logo has been created and is going through various stages of approval before it will be released later this year. 

Pilot programme – Mentoring  

In February 2019, we initiated a pilot mentoring project as part of the induction process for the new DCs. The mentors are established DCs who have volunteered to support those new DCs wishing to take part in this pilot exercise. This followed on from our January forum where the benefits of mentoring for both mentees and mentors were outlined by Hannah Clements of RDP. At this forum, which preceded the University-wide call for new DCs, we also held a workshop where DCs were divided into three groups and asked three questions: what do you wish you knew when you first became a DC that you know now; what could you offer as mentors to the new DCs; how do you think the mentor-mentee system could work? The responses from DCs in the three groups informed the implementation, structure and aims of the mentoring pilot.  

Our aim is to learn from this project in close consultation with both mentors and mentees. We want to see if this process helps new DCs to establish themselves within their departments/institutes. Will it be effective? The findings will inform our steps for the following year. Watch this space! 

Fostering clusters within departments 

We have excellent examples of departments that promote their DCs within their institutions. A good example is the Chemistry department, which has a cluster of five DCs who work together in their advocacy. During this year’s call for new DCs, and with help from the Department Librarian, we used a targeted approach at advertising the DC Programme within the Department of Engineering. This was highly successful, resulting in ten new Data Champions from Engineering from various roles and Academic Divisions. They represent a hub with the local knowledge, experience and skills to assess their department’s needs and explore best approaches to support good RDM practices and Open Research, ones that are tailored to the discipline.  

Alumni community 

Heading toward the programme’s third birthday means that we are growing bigger but also that we are developing an alumni community as well. This is a different kettle of fish but it is on our radar to investigate how we can foster this distinct group and build a network that is not only Cambridge based but has a more national and even international outlook.  

Funding  

Let’s not forget that the DC programme consists of volunteers. We are in the process of seeking more funds to support this ever increasing community, to run expanding bimonthly forums, and to be able to offer grants to assist DCs in their endeavours. As an example, we supported one of the DCs, James Savage, to bring the programme to the international stage in November at the SCIDataCon 2018 in Botswana. He talked about the programme as well as his experience of being a DC. This resulted in James writing a paper together with Lauren Cadwallader, to be published soon in Data Science Journal (the accepted manuscript and associated data available now in Apollo, the Cambridge University institutional repository). 

An exciting year so far! 

During this third year of the DC programme the number of active DCs across the University of Cambridge has doubled. We can only anticipate it growing further each year, yet balanced by an expanding community of alumni DCs as, for example, DCs leave Cambridge. The DC community is inherently dynamic, as is the programme. Because of this, we always seek to respond and adapt to changing conditions in novel and beneficial ways while maintaining the programme’s core structure to provide strong foundations. This has been a period of reflection, organisation and anticipation, all required to drive the Data Champion programme forward and tackle current challenges effectively, as well as those that lie ahead – more on this to come soon!  

Written by Maria Angelaki and Dr Sacha Jones

Published 20 June 2019

License logo CCBY

Perspectives on the Open future

‘More cash, more clarity and don’t make this compulsory’ is the take home message from a recent workshop held with Cambridge researchers on the question of Open Research.

The recent session, called “An Open Future? How Cambridge is Responding to Challenges in the Open Landscape” was with a group of new Cambridge lecturers at a seminar organized by Pathways in Higher Education Practice. This event  offered us an opportunity to go beyond the usual information we provide in our training workshops*.

This session provided a unique opportunity to speak with researchers from various disciplines further along in their career who already had a basic knowledge of Open Access and Research Data sharing requirements. This meant we were able to have more of an informed discussion rather than a lecture and we wanted to hear what they thought about Open Research.

(* The OSC is often asked to provide training on all things Open Research. Generally our training is focused on PhD students and early career researchers. We create our PowerPoint slides that explain the benefits of Open Access, the necessity of a good Data Management Plan or how to promote your research through social media (all of which are freely available here). We try to make these sessions as interactive as possible.)

Quiz Time

The session started by laying out how the current academic publishing model works. Basically, researchers submit their latest findings to a journal for FREE, peer reviewers review the paper for FREE, editors oversee the journal for FREE and the publishers format the article then turn around and charge libraries exorbitant subscription fees (yep, that about sums it up). This got a good laugh from the audience.

So our first activity was a short quiz. We were interested to know if researchers knew how much things cost. We asked them a set of questions:

  1. How much do you think we pay in subscription costs every year?
  2. What’s the average APC?
  3. How many papers were made gold OA and had at least one Cambridge author on it in 2016?

There was a lot of debate among the groups. Some of the answers were wildly overestimated (one researcher suggested £50 million GBP for subscriptions per year), others were quite low.

What are people sharing?

For our next activity, we wanted to know what they were already sharing and what tools they were using to share. We presented each table with a Venn diagram and a bunch of post-its:

Unsurprisingly, the ‘Publication’ circle had the most post-its. Answers included tools such as ArXiv, ResearchGate, and Academia.edu as well as personal websites and Facebook. There were also mentions of Cambridge Open Access and the Departmental Libraries. Interestingly a few noted that they made their work available to researchers through personal contact such as email requests.

There were a few post-its in the ‘Data’ circle describing what tools they used to deposit, such as university repositories and Zenodo.

The ‘Other’ category mostly talked about sharing code and software through github; although, one lecturer noted free workshops they offered. There was only one post-it that made it into the centre and that was for “webpage”. For the future, it may be interesting to know which discipline the researchers were from when they were posting because this theme came up quite a few times during the discussions.

When are people prepared to share?

The second activity involved lots of sticky dots and large pieces of paper. The participants were asked if they were comfortable sharing different aspects of their research at different stages in the research lifecycle. Each sheet was laid out in a grid as follows:

All of the researchers were asked to stick dots in the grid. The results were interesting. Most researchers were happy to share the published version of their paper, but a large number were uncomfortable sharing their pre-print or submitted version. There were only two dots in the “yes” square to share pre-prints. During the discussion it was apparent that this was probably down to the culture of the discipline where one physics researcher said it was part of the process versus one of the lecturers from English who disliked having more than one version of her paper available to read. The Book Chapter had similar results.

Data and Data Management Plans were all over the place. There were quite a few dots in the ‘Not sure’ squares. Most were happy to share data at the time of publication or at the end of the project. For the Data Management Plans it was evenly split between ‘yes’ to sharing at the end of the project versus ‘not sure’. No one wanted to share their DMP at the start of the project. There was some confusion among researchers (mostly from the humanities) who felt they didn’t have any data and therefore there was nothing to share.

The majority of the researchers were unenthusiastic about sharing their Grant Applications or Grey literature at any stage. For Grant Applications the overall feeling was that if the grant was successful then researchers didn’t want to share their methodology. If the grant was unsuccessful, they were reluctant to share their failures or they planned to submit to another granting agency. Most lecturers in the room agreed that they were fine sharing an abstract of their grant awards (which many funders post on their website).

As for Grey Literature which we defined as working papers or opinion papers, no one wanted to share anything that could be considered unfinished or not well thought out. One member of the law faculty said that if they had produced any grey literature worth sharing, then they would publish it in a journal. Moreover, it could be detrimental to their career if they shared anything that wasn’t well-researched and presented.

More money please

To finish up the session, we asked researchers what more could the University be doing to promote Open Research. Not surprisingly most people were resistant to any University mandate telling them what to do. In addition, they were strongly against any Open Research requirements being tied in with HR practices like promotions. The researchers supported discipline specific requirements for Open Research.

Clearer instructions from the University and from funders of what is required of researchers was also desired. Having a myriad of policies is quite confusing and burdensome for researchers who already feel pressured to publish. In the end, most said that if the University would pay, then they would be happy to share their published work.

Published 4 April 2018
Written by Katie Hughes
Creative Commons License

Sustaining open research resources – a funder perspective

This is the second in a series of three blog posts which set out the perspectives of researchers, funders and universities on support for open resources. The first was Open Resources, who should pay? In this post, David Carr from the Open Research team at the Wellcome Trust provides the view of a research funder on the challenges of developing and sustaining the key infrastructures needed to enable open research.

As a global research foundation, Wellcome is dedicated to ensuring that the outputs of the research we fund – including articles, data, software and materials – can be accessed and used in ways that maximise the benefits to health and society.  For many years, we have been a passionate advocate of open access to publications and data sharing.

I am part of a new team at Wellcome which is seeking to build upon the leadership role we have taken in enabling access to research outputs.  Our key priorities include:

  • developing novel platforms and tools to support researchers in sharing their research – such as the Wellcome Open Research publishing platform which we launched last year;
  • supporting pioneering projects, tools and experiments in open research, building on the Open Science Prize which with the NIH and Howard Hughes Medical Institute;
  • developing our policies and practices as a funder to support and incentivise open research.

We are delighted to be working with the Office of Scholarly Communication on the Open Research Pilot Project, where we will work with four Wellcome-funded research groups at Cambridge to support them in making their research outputs open.  The pilot will explore the opportunities and challenges, and how platforms such as Wellcome Open Research can facilitate output sharing.

Realising the long-term value of research outputs will depend critically upon developing the infrastructures to preserve, access, combine and re-use outputs for as long as their value persists.  At present, many disciplines lack recognised community repositories and, where they do exist, many cannot rely on stable long-term funding.  How are we as a funder thinking about this issue?

Meeting the costs of outputs sharing

In July 2017, Wellcome published a new policy on managing and sharing data, software and materials.  This replaced our long-standing policy on data management and sharing – extending our requirements for research data to also cover original software and materials (such as antibodies, cell lines and reagents).  Rather than ask for a data management plan, applicants are now asked to provide an outputs management plan setting out how they will maximise the value of their research outputs more broadly.

Wellcome commits to meet the costs of these plans as an integral part of the grant, and provides guidance on the costs that funding applicants should consider.  We recognise, however, that many research outputs will continue to have value long after the funding period comes to an end.  Further, while it not appropriate to make all research data open indefinitely, researchers are expected to retain data underlying publications for at least ten years (a requirement which was recently formalised in the UK Concordat on Open Research Data).  We must accept that preserving and making these outputs available into the future carries an ongoing cost.

Some disciplines have existing subject-area repositories which store, curate and provide access to data and other outputs on behalf of the communities they serve.  Our expectation, made more explicit in our new policy, is that researchers should deposit their outputs in these repositories wherever they exist.  If no recognised subject-area repository is available, we encourage researchers to consider using generalist repositories – such as Dryad, FigShare and Zenodo – or if not, to use institutional repositories.  Looking ahead, we may consider developing an orphan repository to house Wellcome-funded research data which has no other obvious home.

Recognising the key importance of this infrastructure, Wellcome provides significant grant funding to repositories, databases and other community resources.  As of July 2016, Wellcome had active grants totalling £80 million to support major data resources.  We have also invested many millions more in major cohort and longitudinal studies, such as UK Biobank and ALSPAC.  We provide such support through our Biomedical Resource and Technology Development scheme, and have provided additional major awards over the years to support key resources, such as PDB-Europe, Ensembl and the Open Microscopy Environment.

While our funding for these resources is not open-ended and subject to review, we have been conscious for some time that the reliance of key community resources on grant funding (typically of three to five years’ duration) can create significant challenges, hindering their ability to plan for the long-term and retain staff.  As we develop our work on Open Research, we are keen to explore ways in which we adapt our approach to help put key infrastructures on a more sustainable footing, but this is a far from straightforward challenge.

Gaining the perspectives of resource providers

In order to better understand the issues, we did some initial work earlier this year to canvas the views of those we support.  We conducted semi-structured interviews with leaders of 10 resources in receipt of Wellcome funding – six database and software resources, three cohort resources and one materials stock centre – to explore their current funding, long-term sustainability plans and thoughts on the wider funding and policy landscape.

We gathered a wealth of insights through these conversations, and several key themes emerged:

  • All of the resources were clear that they would continue to be dependent on support from Wellcome and/or other funders for the long-term.
  • While cohort studies (which provide managed access to data) can operate cost recovery models to transfer some of the cost of accessing data onto users, such models were not appropriate for data and software resources who commit to open and unrestricted access.
  • Several resources had additional revenue-generation routes – including collaborations with commercial entities– and these had delivered benefits in enhancing their resources.  However, the level of income was usually relatively modest in terms of the total cost of sustaining the resource. Commitments to openness could also limit the extent to which such arrangements were feasible.
  • Diversification of funding sources can give greater assurance and reduce reliance on single funders, but can bring an additional burden.  There was felt to be a need for better coordination between funders where they co-fund resources.  Europe PMC, which has 27 partner funders but is managed through a single grant is a model which could be considered.
  • Several of the resources were actively engaged in collaborations with other resources internationally that house related data – it was felt that funders could help further facilitate such partnerships.

We are considering how Wellcome might develop its funding approaches in light of these findings.  As an initial outcome, we plan to develop guidance for our funded researchers on key issues to consider in relation to sustainability.  We are already working actively with other funders to facilitate co-funding and make decisions as streamlined as possible, and wish to explore how we join forces in the future in developing our broader approaches for funding open resources.

Coordinating our efforts

There is growing recognition of the crucial need for funders and wider research community to work together develop and sustain research data infrastructure.  As the first blog in this series highlighted, the scientific enterprise is global and this is an issue which must be addressed international level.

In the life sciences, the ELIXIR and US BD2K initiatives have sought to develop coordinated approaches for supporting key resources and, more recently, the European Open Science Cloud initiative has developed a bold vision for a cloud-based infrastructure to store, share and re-use data across borders and disciplines.

Building on this momentum, the Human Frontiers Science Programme convened an international workshop last November to bring together data resources and major funders in the life sciences.  This resulted in a call for action (reported in Nature) to coordinate efforts to ensure long-term sustainability of key resources, whilst supporting resources in providing access at no charge to users.  The group proposed an international mechanism to prioritise core data resources of global importance, building on the work undertaken by ELIXIR to define criteria for such resources.  It was proposed national funders could potentially then contribute a set proportion of their overall funding (with initial proposals suggesting around 1.5 to 2 per cent) to support these core data resources.

Grasping the nettle

Public and charitable funders are acutely aware that many of the core repositories and resources needed to make research outputs discoverable and useable will continue to rely on our long-term funding support.  There is clear realisation that a reliance on traditional competitive grant funding is not the ideal route through which to support these key resources in a sustainable manner.

But no one yet has a perfect solution and no funder will take on this burden alone.  Aligning global funders and developing joint funding models of the type described above will be far from straightforward, but hopefully we can work towards a more coordinated international approach.  If we are to realise the incredible potential of open research, it’s a challenge we must address

Published 26 July 2017
Written by David Carr, Wellcome Trust (d.carr@wellcome.ac.uk)

Creative Commons License