Tag Archives: funders

Charities’ perspective on research data management and sharing

In 2015 the Cambridge Research Data Team organised several discussions between funders and researchers. In May 2015 we hosted Ben Ryan from EPSRC, which was followed by a discussion with Michael Ball from BBSRC in August. Now we have invited our two main charity funders to discuss their views on data management and sharing with Cambridge researchers.

David Carr from the Wellcome Trust and Jamie Enoch from Cancer Research UK (CRUK) met with our academics on Friday 22 January at the Gurdon Institute. The Gurdon Institute was founded jointly by the Wellcome Trust and CRUK to promote research in the areas of developmental biology and cancer biology, and to foster a collaborative environment for independent research groups with diverse but complementary interests.

This blog summarises the presentations and discusses the data sharing expectations from Wellcome Trust and CRUK. A second related blog ‘In conversation with Wellcome Trust and CRUK‘ summarises the question and answer session that was held with a group of researchers on the same day.

Wellcome Trust’s requirements for data management and sharing

Sharing research data is key for Wellcome’s goal of improving health

David Carr started his presentation explaining that the Wellcome Trust’s mission is to support research with the goal of improving health. Therefore, the Trust is committed to ensuring research outputs (including research data) can be accessed and used in ways that will maximise health and societal benefits. David reminded the audience of benefits of data sharing. Data which is shared has the potential to:

  • Enable validity and reproducibility of research findings to be assessed
  • Increase the visibility and use of research findings
  • Enable research outputs to be used to answer new questions
  • Reduce duplication and waste
  • Enable access to data to other key communities – public, policymakers, healthcare professionals etc.

Data sharing goes mainstream

David gave on overview of data sharing expectations from various angles. He started by referring to the Royal Society’s report from 2012: Science as an open enterprise, which sets sharing as the standard for doing science. He then also mentioned other initiatives like the G8 Science Ministers’ statement, the joint report from the Academy of Medical Sciences, BBSRC, MRC and Wellcome Trust on reproducibility and reliability of biomedical research and the UK Concordat on Open Research Data with a take-home message that sharing data and other research outputs is increasingly becoming a global expectation, and a core element of good research practice.

Wellcome Trust’s policy for open data

The next aspect of David’s presentation was Wellcome Trust’s policy on data management and sharing. The policy was first published almost a decade ago (2007) with subsequent modifications in 2010. The principle of the policy is simple: research data should be shared and preserved in a manner which maximises its value to advance research and improve health. Wellcome Trust also requires data management plans as a compulsory part of grant applications, where the proposed research is likely to generate a dataset that will have significant value to researchers and other users. This is to ensure that researchers understand the importance of data management and sharing and to plan for it from the start their projects.

Cost of data sharing

Planning for data management and sharing involves costing for these activities in the grant proposal. The Wellcome Trust’s FAQ guidance on data sharing policy says that: “The Trust considers that timely and appropriate data management and sharing should represent an integral component of the research process. Applicants may therefore include any costs associated with their proposed approach as part of their proposal.” David then outlined the types of costs that can be included in grant applications (including for dedicated staff, hardware and software, and data access costs). He noted that in the current draft guidance on costing for data management estimated costs for long-term preservation that extend beyond the lifetime of the grant are not eligible, although costs associated with the deposition of data in recognised data repositories can be requested.

Key priorities and emerging areas in data management and sharing

Infrastructure

The Wellcome Trust also identified key priorities and emerging areas where work needs to be done to better support of data management and sharing. The first one was to provide resources and platforms for data sharing and access. David pointed out that wherever available, discipline-specific data repositories are the best home for research data, as they provide rich metadata standards, community curation and better discoverability of datasets.

However, the sustainability of discipline-specific repositories is sometimes uncertain. Discipline-specific resources are often perceived as ‘free’. However, research data submitted to ‘free’ data repositories has to be stored somewhere and the amount of data produced and shared is growing exponentially – someone has to pay for the cost of storage and long-term curation in discipline-specific data repositories. An additional point for consideration is that many disciplines do not have their own repositories and therefore need to heavily rely on institutional support.

Access

Wellcome Trust funds a large number of projects in clinical areas. Dealing with patient data requires careful ethical considerations and planning from the very start of the project to ensure that data can be successfully shared at the end of the project. To support researchers in dealing with patient data The Expert Advisory Group on Data Access (a cross-funder advisory body established by MRC, ESRC, Cancer Research UK and the Wellcome Trust) has developed guidance documents and practice papers about handling of sensitive data: how to ask for informed consent, how to anonymise data and the procedures that need to be in place when granting access to data. David stressed that balance needs to be struck between maximising the use of data and the need to safeguard research participants.

Incentives for sharing

Finally, if sharing is to become the normal thing to do, researchers need incentives to do so. Wellcome Trust is keen to work with others to ensure that researchers who generate and share datasets of value receive appropriate recognition for their efforts. A recent report from the Expert Advisory Group on Data Access proposed several recommendations to incentivise data sharing, with specific roles for funders, research leaders, institutions and publishers. Additionally, in order to promote data re-use, the Wellcome Trust joined forces with the National Institutes of Health and the Howard Hughes Medical Institute and launched the Open Science Prize competition to encourage prototyping and development of services, tools or platforms that enable open content.

Cancer Research UK’s views on data sharing

The next talk was by Jamie Enoch from Cancer Research UK. Jamie started by saying that because Cancer Research UK (CRUK) is a charity funded by the public, it needs to ensure it makes the most of its funded research: sharing research data is elemental to this. Making the most of the data generated through CRUK grants could help accelerate progress towards the charity’s aim in its research strategy, to see three quarters of people surviving cancer by 2034. Jamie explained that his post – Research Funding Manager (Data) – has been created as a reflection of data sharing being increasingly important for CRUK.

The policy

Jamie started talking about the key principles of CRUK data sharing policy by presenting the main issues around research data sharing and explaining the CRUK’s position in relation to them:

  • What needs to be shared? All research data, including unpublished data, source code, databases etc, if it is feasible and safe to do so. CRUK is especially keen to ensure that data underpinning publications is made available for sharing.
  • Metadata: Researchers should adhere to community standards/minimum information guidelines where these exist.
  • Discoverability: Groups should be proactive in communicating the contents of their datasets and showcasing the data available for sharing

Jamie explained that CRUK really wants to increase the discoverability of data. For example, clinical trials units should ideally provide information on their websites about the data they generate and clear information about how it can be accessed.

  • Modes of sharing: Via community or generalist repositories, under the auspices of the PI or a combination of methods

Jamie explained that not all data can be/should be made openly available. Due to ethical considerations sometimes access to data will have to be restricted. Jamie explained that as long as restrictions are justified, it is entirely appropriate to use them. However, if access to data is restricted, the conditions on which access will be granted should be considered at the project outset, and these conditions will have to be clearly outlined in metadata descriptions to ensure fair governance of access.

  • Timeframes: Limited period of exclusive use permitted where justified

Jamie suggested adhering to community standards when thinking about any periods of exclusive use of generated research data. In some communities research data is made accessible at the time of publication. Other communities will expect data release at the time of generation (especially in collaborative genomics projects). Jamie further explained that particularly in cases where new data can affect policy development, it is key that research data is released as soon as possible.

  • Preservation: Data to be retained for at least 5 years after grant end
  • Acknowledgement: Secondary users of data should credit original researcher and CRUK
  • Costs: Appropriately justified costs can be included in grant proposals

As of late 2015, financial support for data management and sharing can be requested as a running cost in grant applications. Jamie explained that there are no particular guidelines in place explaining eligible and non-eligible costs and that the most important aspect is whether the costs are well justified or not, and reasonable in the context of the research envisaged.

Jamie stressed that the key point of the CRUK policy is to facilitate data sharing and to engage with the research community, recognising the challenges of data sharing for different projects and the need to work through these collaboratively, rather than enforce the policy in a top-down fashion.

Policy implementation

Subsequently, the presentation discussed ways in which CRUK policy is implemented. Jamie explained that the main tool for the policy implementation is the new requirement for data management plans as compulsory part of grant applications.

Two of the three main response mode committees: Science Committee and Clinical Research Committee have a two-step process of writing a data management plan. During the grant application stage researchers need to write a short, free-form description about how they plan to adhere to CRUK’s policy on data sharing. Only if the grant is accepted, the beneficiary will be asked to write a more detailed data management plan, in consultation with CRUK representatives.

This approach serves two purposes as it:

  • ensures that all applicants are aware of CRUK’s expectations on data sharing (they all need to write a short paragraph about data sharing)
  • saves researchers’ time: only those applicants who were successful will have to provide a detailed data management plan, and it allows the CRUK office to engage with successful applicants on data sharing challenges and opportunities

In contrast, applicants for the other main CRUK response mode committee, the Population Research Committee, all fill out a detailed data management and sharing plan at application stage because of the critical importance of sharing data from cohort and epidemiological studies.

Outlooks for the future

Similarly to the Wellcome Trust, CRUK realised that cultural change is needed for sharing to become the normality. CRUK have initiated many national and international partnerships to help the reward of data sharing.

One of them is a collaboration with the YODA (Yale Open Data Access) project aiming to develop metrics to monitor and evaluate data sharing. Other areas of collaborative work include collaboration with other funders on development of guidelines on ethics of data management and sharing, platforms for data preservation and discoverability, procedures for working with population and clinical data. Jamie stressed that the key thing for CRUK is to work closely with researchers and research managers – to understand the challenges and work through these collaboratively, and consider exciting new initiatives to move the data sharing field forwards.

Links

Published 5 February 2016
Written by Dr Marta Teperek, verified by David Carr and Jamie Enoch
Creative Commons License

2015 – that was the year that was

This time last year, the Office of Scholarly Communication at Cambridge University had been in existence for one week. As the inaugural Head of the Office, I had landed in the UK from Australia on 1 January, and was still battling jet lag. What a difference a year makes. This blog is a short run down of what has happened in 2015 and a brief peek into our plans for 2016.

The OSC has three primary foci – managing compliance with funders, external engagement and working with the Cambridge community to ensure awareness of broader scholarly communication issues. In our spare time we have also taken on a few projects.

Managing funder compliance

Open Access

The University of Cambridge is engaging its research community with open access with a broad approach, both offering solutions for compliance management and determining ways in which the community can continue their normative communication behaviours while increasing access to their research.

As with all universities in the UK, the Open Access service is managing multiple and conflicting open access policies in a complex publishing landscape. The RCUK open access policy has been in effect since April 2013, and the COAF policy continues the longstanding Wellcome Trust open access policy. In all the OSC manages annual funds from these of approximately £2 million to support open access compliance. HEFCE announced its upcoming open access REF policy in March 2014.

In October 2014 the University introduced a user experience evidence-based new system for compliance with the tag line “Accepted for publication? Send us your manuscript“. This is a system designed to ensure that the researcher only has to act once in order to comply with multiple policies. Researchers use an attractive and simple interface where they are asked to upload their manuscript, complete a short form and submit. Our OA team then check funder and publisher policies and deposit the work in the repository for HEFCE compliance and determine the payment options required and funds available for the article, using a decision tree. The team manage the article payment processes and contact the author once the work is complete. From the author perspective this is a simple and much liked system.

Outreach has included contacting departmental administrators, speaking to research communities, attending Committee meetings and so on to spread the word. Despite this, the team processes an average of 240 unique HECFE eligible papers per month, representing approximately 30% of research output.  While this may be cause for concern in relation to future REF compliance, a brief analysis of the open access publication activities of Cambridge researchers indicates that 60% of Cambridge research is being made available  – including through our system.

We continue to have challenges relating to publishers not making articles open access under the correct licence (or even at all) despite our payment of Article Processing Charges. The checking and chasing up of these publishers is extremely time consuming. In an attempt to ensure the publishers did what we were paying for we brought in Purchase Orders for the first half of the reporting period. This has caused serious issues when it came to reporting in terms of matching the articles listed in the Open Access systems against the financial systems of the University for reporting purposes to the RCUK. As it was not making any difference to publisher behaviour we abandoned this approach. The only issues we have encountered have been for articles that are hybrid – Cambridge University (across both the RCUK and COAF funds) spends approximately 74% on hybrid journals as opposed to fully OA journals.

There has been a constant reporting requirement throughout 2015, first to Jisc, then the RCUK, the Wellcome Trust and Jisc a second time. This has been a huge drain on personnel as none of the reporting periods align, requiring several months FTE equivalent’s worth of work. This is due to several issues, of which the Purchase Order problem mentioned above is a minor factor.  The large number of articles that are required to be reported on in detail on an individual basis is a complex task. 

Research Data Management

2015 has been a big year for Research Data Management, with the EPSRC announcing they would start checking to ensure researchers are making their underlying data available. The Research Data Facility has spent the year focused on increasing awareness, providing support and resources, and managing data with huge success. There have been face to face meetings with over 1300 researchers, and data submissions have risen exponentially (see here for a graphic of the numbers in July 2015). The team provides Research Data Management Plan support, and the data website has had over 16,500 visits.

We have spent a huge amount of time talking to the Cambridge research community. One outcome of these discussions is a deep understanding of the concerns and challenges for researchers in relation to data sharing. To address these we have provided fora for our researchers to meet with the funders to find solutions.  Our meetings with EPSRC and BBSRC resolved many concerns and resulted in an endorsed set of FAQs about research data sharing.

We have contributed to policy development by working with our contemporaries at many institutions to provide a coordinated response to the proposed UK Concordat on Research Data.

Systems management

A perennial issue with open access is the integration of systems within the institution to achieve the holy grail of ‘deposit once, use many times’. We are not there yet, although we have made good inroads. Cambridge University was one of the testbed institutions for DSpace, and the repository has been in place since 2005. The repository had suffered from a lack of attention and by the beginning of 2015 was not functioning properly and contained a large amount of bespoke coding.

The upgrade of DSpace from Version 3.4 to Version 4.3 took many months because it involved an associated standardisation of the base code to ensure future upgrades will be smooth. We also needed to create a new server platform for the repository to sit in which has stabilised our operations. The repository policy has been revisited and the agreements and licenses associated with minting DOIs are now in place, and the next step is to look at integration with other University systems.

We held a repository naming competition during the year, with the winning name being ‘Apollo’ – the god of logic.  The new name and logo will be launched when the repository interface is upgraded in early 2016. The repository now holds 13,269 articles and manuscripts, 359 datasets and 713 working papers. In total there are more than 200,000 items held in the repository – 175,429 of these are chemical structures.

Engagement and awareness

Within Cambridge

Cambridge University is a large and complex many-headed beast. Engaging this community is extremely challenging. The Office of Scholarly Communication runs a large number of electronic communication channels to ensure researchers are able to stay up to date and informed about open access and research data management, including the Research Data Management website, the Office of Scholarly Communication website and the Open Access website.

We send out monthly newsletters on Research Data Management to over 1000 subscribers, and at the end of 2015 launched a monthly Open Access newsletter – you can sign up here.  We use Twitter extensively (see @CamOpenData, @CamOpenAccess and @dannykay68). In addition the OSC has produced a series of advocacy materials to support their work.

But it is not all electronic – we have also have presented to over 1600 researchers and administrative staff during 2015 through events, presentations and workshops. Highlights have included workshops on software licensing,  an Open Access week joint event with Cambridge University Press addressing the question: ‘Can society afford open access?’ (see a video summary here), and an Open Data panel discussion ‘Open Data – moving science forward or a waste of money and time?‘. The video of this event is here.

More broadly

This Unlocking Research blog provides information and analysis on issues relating to Scholarly Communication, Open Access, Research Data Management and Library matters. The blog  is well used, with over 16,000 visits since launching.

The post with the greatest impact was Dutch boycott of Elsevier – a game changer? with over 3,500 visits in the first week before it was reblogged by the London School of Economics. [Late news added 22 Jan 2016: This blog was listed as one of the Top Ten Posts for 2015: Open Access. It was also listed as one of the blogs that had an average minute per page measurement of over 6 minutes and 30 seconds.]

Members of the OSC are increasingly being invited to speak at conferences both within the UK and beyond. Topics have included:

We are also active participants in the discussions held amongst our communities within and outside of the UK. There is a high level of cooperation amongst those working in the area of scholarly communication and open access. The OSC contributes to meetings and initiatives organised by the League of European Research UniversitiesSPARC Europe and the UK Council of Research Repositories amongst others.

Training and support

Supporting Researchers in the 21st century

The OSC launched the ‘Supporting Researchers in the 21st century’ programme – aimed at library and other administrative staff – with three introductory workshops held over six weeks from May to early July. 103 people attended. Working from feedback obtained at these events the programme began offering training and workshops from late July.

Topics covered to date include Research Data Management for Librarians, a Primer on Open Access, Information Security in a Research Environment, Introduction to Metrics and a Day in the Life of Researcher and Meet an Open Access publisher. In addition there have been several opportunities to hear from visiting international experts including:

Research Support Ambassadors

The Research Support Ambassador programme began as an idea of a ‘crack team’ of people who could be deployed across the University to present workshops on Scholarly Communication issues. The general philosophy was that this was a way to encourage staff across the library community and across the grade range to step up.

We have had 18 brave souls volunteer to be the first group in what has frankly been a rather ‘organic’ process given we had no idea how this was going to play out.  The reasons members of the group gave for participating included the opportunity to learn more and gain skills, be able to support researchers better and several people wanted more face to face interactions. We ran two sets of intensive training sessions where we decided to focus on four areas:

  • Researcher Support in Cambridge
  • Managing your online presence
  • Making your thesis open access
  • The Research Lifecycle

We have taken a constructivist approach to learning – where learners take charge of their own learning. The group has worked with a mixture of self education and team work to try and develop ‘modular’ outputs that can be presented by others. There is a blog listing the progress on these topics to date here.

There have been significant challenges to the process with a mixture of new material and technologies, working in teams with new colleagues and limited time. In addition they have had to self direct as the recruitment process for an Research Skills Coordinator took eight months. To the Ambassador’s credit they have stuck through a confusing process with very little direction. There is a blog post on an insider’s view of the programme here.

Other projects

Unlocking Theses project

This project is the first step to dramatically increase the number of open access theses in the repository, which stood at about 600 at the beginning of 2015. On average one in ten PhD students deposit their thesis to make it available. The repository currently does not allow any other type of thesis to be deposited.

This system has meant that when a researcher requests a copy of a thesis for research purposes, the bound version needs to be scanned. In 2015 the Library held over 1200 scanned theses on an internal server. The Unlocking Theses project added all of these scanned theses held by the Library into the University repository, Apollo which now holds 2176 theses, of which 1,021 are openly accessible. The Development and Alumni Office were able to provide contact details for just over 600 of these authors. The majority of these authors have now been contacted and we have had a 35% positive response rate from them. We are in the final process of opening these theses. The remaining 1155 theses are currently held in a Restricted Theses Collection but the biographical information about these theses is searchable.

Managing Cambridge Journals project

Cambridge University Libraries are interested in supporting new forms of open access publishing.  In 2015 a search revealed that at least seven research and 13 student self-published journals and magazines currently circulate within the Cambridge community. These range widely in quality from almost professional publications to literally photocopied pages. The Managing Cambridge Journals project is working with Cambridge University Press to offer support to Cambridge researchers who are publishing outside of the traditional channels.  Three areas of potential support have been identified – a publishing platform, information and support and possibly an internal Cambridge publishing ‘brand’.  Work is already underway to ingest the full decade of articles published in the Cambridge Journal of China Studies into the repository from their currently unstable home on a website.

The team

Screen Shot 2016-01-11 at 15.56.08To achieve all of this has required a huge effort on many people’s behalf. In January 2015 the OSC had three staff plus the Head – two Open Access Research Advisors and a part-time Repository Manager. Now the team sits at 12 people and this number is relatively fluid.

This sounds like a huge group – which it is. But with only two exceptions – of which the Head is one – all staff are either temporary staff or on extremely short term contracts. This is primarily related to (a lack of) funding and has two effects. First, a disproportionate amount of time is spent on managing recruitment, writing job descriptions, advertising, interviewing and so on. Almost all HR requirements are still enforced regardless of the brevity of contracts – including monthly probation interviews.

The second effect is the constant need to lobby for financial support which requires creating business cases, new organisational charts and many, many meetings. The Library has been nothing but supportive throughout this process, but there is a need for the broader institution to recognise that much of the work done in the OSC falls in the University rather than Library camp.

Looking forward to 2016

This upcoming year is shaping up to be as busy and productive as the first year of operation. Some of the planned activities include:

  • Negotiation with Research Council UK funders on possible funding options for the Research Data Facility.
  • The Communication across the Research Lifecycle project aims to join up communication with researchers by Cambridge administrative departments. This requires scoping the current communication channels and developing advocacy materials across the University administrative departments. There is currently no financial support for this project.
  • Participating in the JISC Shared Research Data Management Shared Services pilot
  • Increase the collaboration with Cambridge University Press on the Managing Cambridge Journals project to develop this project to operational level.
  • The second tranche of upgrades to DSpace are underway. This will involve an upgrade to V5 and implement ‘request a copy’ buttons, minting DOIs, registering the repository to wider aggregation systems and updating the look and feel of the interface. This work is expected to be completed by Easter 2016.
  • A Repository Integration Manager will start work on the interoperability of DSpace with Symplectic and other systems in the University. New forms and simple deposit processes will be developed.
  • Increase theses deposit by developing a new form, and amendment to the policy to allow all theses types to be deposited.
  • Pilot with selected departments to require the deposit of a digital thesis at the same time as the printed and bound version, with the option of making the work available.
  • Complete the first round of the Research Support Ambassador programme with some skills training and finalisation of training products before the group is released into the wild.
  • Negotiate with arXiv and other open access providers to allow researchers to meet funder requirements within their usual communication norms.
  • Develop a comprehensive Research Data Management training program for PhD students.
  • Build on the Supporting Researchers in the 21st century programme.
  • Present at conferences in the UK and abroad.

So, watch this space!

Published 11 January 2016
Written by Dr Danny Kingsley
Creative Commons License

Open Data – moving science forward or a waste of money & time?

On the 4 November the Research Data Facility at Cambridge University invited some inspirational leaders in the area of research data management and asked them to address the question: “is open data moving science forward or a waste of money & time?”. Below are Dr Marta Teperek’s impressions from the event.

Great discussion

Want to initiate a thought-provoking discussion on a controversial subject? The recipe is simple: invite inspirational leaders, bright people with curious minds and have an excellent chair. The outcome is guaranteed.

We asked some truly inspirational leaders in data management and sharing to come to Cambridge to talk to the community about the pros and cons of data sharing. We were honoured to have with us:

  • PRE_IntroSlide_V3_20151123Rafael Carazo-Salas, Group Leader, Department of Genetics, University of Cambridge
    @RafaCarazoSalas
  • Sarah Jones, Senior Institutional Support Officer from the Digital Curation Centre; @sjDCC
  • Frances Rawle, Head of Corporate Governance and Policy, Medical Research Council; @The_MRC
  • Tim Smith, Group Leader, Collaboration and Information Services, CERN/Zenodo; @TimSmithCH
  • Peter Murray-Rust, Molecular Informatics, Dept. of Chemistry, University of Cambridge, ContentMine; @petermurrayrust

The discussion was chaired by Dr Danny Kingsley, the Head of Scholarly Communication at the University of Cambridge (@dannykay68).

What is the definition of Open Data?

IMG_PMRWithText_V1_20151126The discussion started off with a request for a definition of what “open” meant. Both Peter and Sarah explained that ‘open’ in science was not simply a piece of paper saying ‘this is open’. Peter said that ‘open’ meant free to use, free to re-use, and free to re-distribute without permission. Open data needs to be usable, it needs to be described, and to be interpretable. Finally, if data is not discoverable, it is of no use to anyone. Sarah added that sharing is about making data useful. Making it useful also involves the use of open formats, and implies describing the data. Context is necessary for the data to be of any value to others.

What are the benefits of Open Data?

IMG_RCSWithText_V1_20151126Next came a quick question from Danny: “What are the benefits of Open Data”? followed by an immediate riposte from Rafael: “What aren’t the benefits of Open Data?”. Rafael explained that open data led to transparency in research, re-usability of data, benchmarking, integration, new discoveries and, most importantly, sharing data kept it alive. If data was not shared and instead simply kept on the computer’s hard drive, no one would remember it months after the initial publication. Sharing is the only way in which data can be used, cited, and built upon years after the publication. Frances added that research data originating from publicly funded research was funded by tax payers. Therefore, the value of research data should be maximised. Data sharing is important for research integrity and reproducibility and for ensuring better quality of science. Sarah said that the biggest benefit of sharing data was the wealth of re-uses of research data, which often could not be imagined at the time of creation.

Finally, Tim concluded that sharing of research is what made the wheels of science turn. He inspired further discussions by strong statements: “Sharing is not an if, it is a must – science is about sharing, science is about collectively coming to truths that you can then build on. If you don’t share enough information so that people can validate and build up on your findings, then it basically isn’t science – it’s just beliefs and opinions.”

IMG_TSWithText_V1_20151126Tim also stressed that if open science became institutionalised, and mandated through policies and rules, it would take a very long time before individual researchers would fully embrace it and start sharing their research as the default position.

I personally strongly agree with Tim’s statement. Mandating sharing without providing the support for it will lead to a perception that sharing is yet another administrative burden, and researchers will adopt the ‘minimal compliance’ approach towards sharing. We often observe this attitude amongst EPSRC-funded researchers (EPSRC is one of the UK funders with the strictest policy for sharing of research data). Instead, institutions should provide infrastructure, services, support and encouragement for sharing.

Big data

Data sharing is not without problems. One of the biggest issues nowadays it the problem of sharing of big data. Rafael stressed that with big data, it was extremely expensive not only to share, but even to store the data long-term. He stated that the biggest bottleneck in progress was to bridge the gap between the capacity to generate the data, and the capacity to make it useful. Tim admitted that sharing of big data was indeed difficult at the moment, but that the need would certainly drive innovation. He recalled that in the past people did not think that one day it would be possible just to stream videos instead of buying DVDs. Nowadays technologies exist which allow millions of people to watch the webcast of a live match at the same time – the need developed the tools. More and more people are looking at new ways of chunking and parallelisation of data downloads. Additionally, there is a change in the way in which the analysis is done – more and more of it is done remotely on central servers, and this eliminates the technical barriers of access to data.

Personal/sensitive data

IMG_FRWithText_V1_20151126Frances mentioned that in the case of personal and sensitive data, sharing was not as simple as in basic sciences disciplines. Especially in medical research, it often required provision of controlled access to data. It was not only important who would get the data, but also what they would do with it. Frances agreed with Tim that perhaps what was needed is a paradigm shift – that questions should be sent to the data, and not the data sent to the questions.

Shades of grey: in-between “open” and “closed”

Both the audience and the panellists agreed that almost no data was completely “open” and almost no data was completely “shut”. Tim explained that anything that gets research data off the laptop to a shared environment, even if it was shared only with a certain group, was already a massive step forward. Tim said: “Open Data does not mean immediately open to the entire world – anything that makes it off from where it is now is an important step forward and people should not be discouraged from doing so, just because it does not tick all the other checkboxes.” And this is yet another point where I personally agreed with Tim that institutionalising data sharing and policing the process is not the way forward. To the contrary, researchers should be encouraged to make small steps at a time, with the hope that the collective move forward will help achieving a cultural change embraced by the community.

Open Data and the future of publishing

Another interesting topic of the discussion was the future of publishing. Rafael started explaining that the way traditional publishing works had to change, as data was not two-dimensional anymore and in the digital era it could no longer be shared on a piece of paper. Ideally, researchers should be allowed to continue re-analysing data underpinning figures in publications. Research data underpinning figures should be clickable, re-formattable and interoperable – alive.

IMG_DKWithText_V1_20151126Danny mentioned that the traditional way of rewarding researchers was based on publishing and on journal impact factors. She asked whether publishing data could help to start rewarding the process of generating data and making it available. Sarah suggested that rather than having the formal peer review of data, it would be better to have an evaluation structure based on the re-use of data – for example, valuing data which was downloadable, well-labelled, re-usable.

Incentives for sharing research data

IMG_SJWithText_V1_20151126The final discussion was around incentives for data sharing. Sarah was the first one to suggest that the most persuasive incentive for data sharing is seeing the data being re-used and getting credit for it. She also stated that there was also an important role for funders and institutions to incentivise data sharing. If funders/institutions wished to mandate sharing, they also needed to reward it. Funders could do so when assessing grant proposals; institutions could do it when looking at academic promotions.

Conclusions and outlooks on the future

This was an extremely thought-provoking and well-coordinated discussion. And maybe due to the fact that many of the questions asked remained unanswered, both the panellists and the attendees enjoyed a long networking session with wine and nibbles after the discussion.

From my personal perspective, as an ex-researcher in life sciences, the greatest benefit of open data is the potential to drive a cultural change in academia. The current academic career progression is almost solely based on the impact factor of publications. The ‘prestige’ of your publications determines whether you will get funding, whether you will get a position, whether you will be able to continue your career as a researcher. This, connected with a frequently broken peer-review process, leads to a lot of frustration among researchers. What if you are not from the world’s top university or from a famous research group? Will you be able to still publish your work in a high impact factor journal? What if somebody scooped you when you were about to publish results of your five years’ long study? Will you be able to find a new position? As Danny suggested during the discussion, if researchers start publishing their data in the ‘open”’ there is a chance that the whole process of doing valuable research, making it useful and available to others will be rewarded and recognised. This fits well with Sarah’s ideas about evaluation structure based on the re-use of research data. In fact, more and more researchers go to the ‘open’ and use blog posts and social media to talk about their research and to discuss the work of their peers. With the use of persistent links research data can be now easily cited, and impact can be built directly on data citation and re-use, but one could also imagine some sort of badges for sharing good research data, awarded directly by the users. Perhaps in 10 or 20 years’ time the whole evaluation process will be done online, directly by peers, and researchers will be valued for their true contributions to science.

And perhaps the most important message for me, this time as a person who supports research data management services at the University of Cambridge, is to help researchers to really embrace the open data agenda. At the moment, open data is too frequently perceived as a burden, which, as Tim suggested, is most likely due to imposed policies and institutionalisation of the agenda. Instead of a stick, which results in the minimal compliance attitude, researchers need to see the opportunities and benefits of open data to sign up for the agenda. Therefore, the Institution needs to provide support services to make data sharing easy, but it is the community itself that needs to drive the change to “open”. And the community needs to be willing and convinced to do so.

Further resources

  • Click here to see the full recording of the Open Data Panel Discussion.
  • And here you can find a storified version of the event prepared by Kennedy Ikpe from the Open Data Team.

Thank you

We also wanted to express a special ‘thank you’ note to Dan Crane from the Library at the Department of Engineering, who helped us with all the logistics for the event and who made it happen.

Published 27 November 2015
Written by Dr Marta Teperek
Creative Commons License