Tag Archives: funders

‘It is all a bit of a mess’ – observations from Researcher to Reader conference

“It is all a bit of a mess. It used to be simple. Now it is complicated.” This was the conclusion of Mark Carden, the coordinator of the Researcher to Reader conference after two days of discussion, debate and workshops about scholarly publication..

The conference bills itself as: ‘The premier forum for discussion of the international scholarly content supply chain – bringing knowledge from the Researcher to the Reader.’ It was unusual because it mixed ‘tribes’ who usually go to separate conferences. Publishers made up 47% of the group, Libraries were next with 17%, Technology 14%, Distributors were 9% and there were a small number of academics and others.

In addition to talks and panel discussions there were workshop groups that used the format of smaller groups that met three times and were asked to come up with proposals. In order to keep this blog to a manageable length it does not include the discussions from the workshops.

The talks were filmed and will be available. There was also a very active Twitter discussion at #R2RConf.  This blog is my attempt to summarise the points that emerged from the conference.

Suggestions, ideas and salient points that came up

  • Journals are dead – the publishing future is the platform
  • Journals are not dead – but we don’t need issues any more as they are entirely redundant in an online environment
  • Publishing in a journal benefits the author not the reader
  • Dissemination is no longer the value added offered by publishers. Anyone can have a blog. The value-add is branding
  • The drivers for choosing research areas are what has been recently published, not what is needed by society
  • All research is generated from what was published the year before – and we can prove it
  • Why don’t we disaggregate the APC model and charge for sections of the service separately?
  • You need to provide good service to the free users if you want to build a premium product
  • The most valuable commodity as an editor is your reviewer time
  • Peer review is inconsistent and systematically biased.
  • The greater the novelty of the work the greater likelihood it is to have a negative review
  • Poor academic writing is rewarded

Life After the Death of Science Journals – How the article is the future of scholarly communication

Vitek Tracz, the Chairman of the Science Navigation Group which produces the F1000Research series of publishing platforms was the keynote speaker. He argued that we are coming to the end of journals. One of the issues with journals is that the essence of journals is selection. The referee system is secret – the editors won’t usually tell the author who the referee is because the referee is working for the editor not the author. The main task of peer review is to accept or reject the work – there may be some idea to improve the paper. But that decision is not taken by the referees, but by the editor who has the Impact Factor to consider.

This system allows for information to be published that should not be published – eventually all publications will find somewhere to publish. Even in high level journals many papers cannot be replicated. A survey by PubMed found there was no correlation between impact factor and likelihood of an abstract being looked at on PubMed.

Readers can now get papers they want by themselves and create their own collections that interest them. But authors need journals because IF is so deeply embedded. Placement in a prestigious journal doesn’t increase readership, but it does increase likelihood of getting tenure. So authors need journals, readers don’t.

Vitek noted F1000Research “are not publishers – because we do not own any titles and don’t want to”. Instead they offer tools and services. It is not publishing in the traditional sense because there is no decision to publish or not publish something – that process is completely driven by authors. He predicted this will be the future of science publishing will shift from journals to services (there will be more tools & publishing directly on funder platforms).

In response to a question about impact factor and author motivation change, Vitek said “the only way of stopping impact factors as a thing is to bring the end of journals”. This aligns with the conclusions in a paper I co-authored some years ago. ‘The publishing imperative: the pervasive influence of publication metrics’

Author Behaviours

Vicky Williams, the CEO of research communications company Research Media discussed “Maximising the visibility and impact of research” and talked abut the need to translate complex ideas in research into understandable language.

She noted that the public does want to engage with research. A large percentage of public want to know about research while it is happening. However they see communication about research is poor. There is low trust in science journalism.

Vicki noted the different funding drivers – now funding is very heavily distributed. Research institutions have to look at alternative funding options. Now we have students as consumers – they are mobile and create demand. Traditional content formats are being challenged.

As a result institutions are needing to compete for talent. They need to build relationships with industry – and promotion is a way of achieving that. Most universities have a strong emphasis on outreach and engagement.

This means we need a different language, different tone and a different medium. However academic outputs are written for other academics. Most research is impenetrable for other audiences. This has long been a bugbear of mine (see ‘Express yourself scientists, speaking plainly isn’t beneath you’).

Vicki outlined some steps to showcase research – having a communications plan, network with colleagues, create a lay summary, use visual aids, engage. She argued that this acts as a research CV.

Rick Anderson, the Associate Dean of the University of Utah talked about the Deeply Weird Ecosystem of publishing. Rick noted that publication is deeply weird, with many different players – authors (send papers out), publishers (send out publications), readers (demand subscriptions), libraries (subscribe or cancel). All players send signals out into the school communications ecosystem, when we send signals out we get partial and distorted signals back.

An example is that publishers set prices without knowing the value of the content. The content they control is unique – there are no substitutable products.

He also noted there is a growing provenance of funding with strings. Now funders are imposing conditions on how you want to publish it not just the narrative of the research but the underlying data. In addition the institution you work for might have rules about how to publish in particular ways.

Rick urged authors answer the question ‘what is my main reason for publishing’ – not for writing. In reality it is primarily to have high impact publishing. By choosing to publish in a particular journal an author is casting a vote for their future. ‘Who has power over my future – do they care about where I publish? I should take notice of that’. He said that ‘If publish with Elsevier I turn control over to them, publishing in PLOS turns control over to the world’.

Rick mentioned some journal selection tools. JANE is a system (oriented to biological sciences) where authors can plug in abstract to a search box and it analyses the language and comes up with suggested list of journals. The Committee on Publication Ethics (COPE) member list provides a ‘white list’ of publishers. Journal Guide helps researchers select an appropriate journal for publication.

A tweet noted that “Librarians and researchers are overwhelmed by the range of tools available – we need a curator to help pick out the best”.

Peer review

Alice Ellingham who is Director of Editorial Office Ltd which runs online journal editorial services for publishers and societies discussed ‘Why peer review can never be free (even if your paper is perfect)’. Alice discussed the different processes associated with securing and chasing peer review.

She said the unseen cost of peer review is communication, when they are providing assistance to all participants. She estimated that per submission it takes about 45-50 minutes per paper to manage the peer review. 

Editorial Office tasks include looking for scope of a paper, the submission policy, checking ethics, checking declarations like competing interests and funding requests. Then they organise the review, assist the editors to make a decision, do the copy editing and technical editing.

Alice used an animal analogy – the cheetah representing the speed of peer review that authors would like to see, but a tortoise represented what they experience. This was very interesting given the Nature news piece that was published on 10 February “Does it take too long to publish research?

Will Frass is a Research Executive at Taylor & Francis and discussed the findings of a T&F study “Peer review in 2015 – A global view”. This is a substantial report and I won’t be able to do his talk justice here, there is some information about the report here, and a news report about it here.

One of the comments that struck me was that researchers in the sciences are generally more comfortable with single blind review than in the humanities. Will noted that because there are small niches in STM, double blind often becomes single blind anyway as they all know each other.

A question from the floor was that reviewers spend eight hours on a paper and their time is more important than publishers’. The question was asking what publishers can do to support peer review? While this was not really answered on the floor* it did cause a bit of a flurry on Twitter with a discussion about whether the time spent is indeed five hours or eight hours – quoting different studies.

*As a general observation, given that half of the participants at the conference were publishers, they were very underrepresented in the comment and discussion. This included the numerous times when a query or challenge was put out to the publishers in the room. As someone who works collaboratively and openly, this was somewhat frustrating.

The Sociology of Research

Professor James Evans, who is a sociologist looking at the science of science at the University of Chicago spoke about How research scientists actually behave as individuals and in groups.

His work focuses on the idea of using data from the publication process that tell rich stories into the process of science. James spoke about some recent research results relating to the reading and writing of science including peer reviews and the publication of science, research and rewarding science.

James compared the effect of writing styles to see what is effective in terms of reward (citations). He pitted ‘clarity’ – using few words and sentences, the present tense, and maintaining the message on point against ‘promotion’ – where the author claims novelty, uses superlatives and active words.

The research found writing with clarity is associated with fewer citations and writing in promotional style is associated with greater citations. So redundancy and length of clauses and mixed metaphors end up enhancing a paper’s search ability. This harks back to the conversation about poor academic writing the day before – bad writing is rewarded.

Scientists write to influence reviewers and editors in the process. Scientists strategically understand the class of people who will review their work and know they will be flattered when they see their own research. They use strategic citation practices.

James noted that even though peer review is the gold standard for evaluating the scientific record. In terms of determining the importance or significance of scientific works his research shows peer review is inconsistent and systematically biased. The greater the reviewer distance results in more positive reviews. This is possibly because if a person is reviewing work close to their speciality, they can see all the criticism. The greater the novelty of the work the greater likelihood it is to have a negative review. It is possible to ‘game’ this by driving the peer review panels. James expressed his dislike of the institution of suggesting reviewers. These provide more positive, influential and worse reviews (according to the editors).

Scientists understand the novelty bias so they downplay the new elements to the old elements. James discussed Thomas Kuhn’s concept of the ‘essential tension’ between the classes of ‘career considerations’ – which result in job security, publication, tenure (following the crowd) and ‘fame’ – which results in Nature papers, and hopefully a Nobel Prize.

This is a challenge because the optimal question for science becomes a problem for the optimal question for a scientific career. We are sacrificing pursuing a diffuse range of research areas for hubs of research areas because of the career issue.

The centre of the research cycle is publication rather than the ‘problems in the world’ that need addressing. Publications bear the seeds of discovery and represent how science as a system thinks. Data from the publication process can be used to tune, critique and reimagine that process.

James demonstrated his research that clearly shows that research today is driven by last year’s publications. Literally. The work takes a given paper and extracts the authors, the diseases, the chemicals etc and then uses a ‘random walk’ program. The result ends up predicting 95% of the combinations of authors and diseases and chemicals in the following year.

However scientists think they are getting their ideas, the actual origin is traceable in the literature. This means that research directions are not driven by global or local health needs for example.

Panel: Show me the Money

I sat on this panel discussion about ‘The financial implications of open access for researchers, intermediaries and readers’ which made it challenging to take notes (!) but two things that struck me in the discussions were:

Rick Andersen suggested that when people talk about ‘percentages’ in terms of research budgets they don’t want you to think about the absolute number, noting that 1% of Wellcome Trust research budget is $7 million and 1% of the NIH research budget is $350 million.

Toby Green, the Head of Publishing for the OECD put out a challenge to the publishers in the audience. He noted that airlines have split up the cost of travel into different components (you pay for food or luggage etc, or can choose not to), and suggested that publishers split APCs to pay for different aspects of the service they offer and allow people to choose different elements. The OECD has moved to a Freemium model where that the payment comes from a small number of premium users – that funds the free side.

As – rather depressingly – is common in these kinds of discussions, the general feeling was that open access is all about compliance and is too expensive. While I am on the record as saying that the way the UK is approaching open access is not financially sustainable, I do tire of the ‘open access is code for compliance’ conversation. This is one of the unexpected consequences of the current UK open access policy landscape. I was forced to yet again remind the group that open access is not about compliance, it is about providing public access to publicly funded research so people who are not in well resourced institutions can also see this research.

Research in Institutions

Graham Stone, the Information Resources Manager, University of Huddersfield talked about work he has done on the life cycle of open access for publishers, researchers and libraries. His slides are available.

Graham discussed how to get open access to work to our advantage, saying we need to get it embedded. OAWAL is trying to get librarians who have had nothing to do with OA into OA.

Graham talked the group through the UK Open Access Life Cycle which maps the research lifecycle for librarians and repository managers, research managers, fo authors (who think magic happens) and publishers.

My talk was titled ‘Getting an Octopus into a String Bag’. This discussed the complexity of communicating with the research community across a higher education institution. The slides are available.

The talk discussed the complex policy landscape, the tribal nature of the academic community, the complexity of the structure in Cambridge and then looked at some of the ways we are trying to reach out to our community.

While there was nothing really new from my perspective – it is well known in research management circles that communicating with the research community – as an independent and autonomous group – is challenging. This is of course further complicated by the structure of Cambridge. But in preliminary discussions about the conference, Mark Carden, the conference organiser, assured me that this would be news to the large number of publishers and others who are not in a higher education institution in the audience.

Summary: What does everybody want?

Mark Carden summarised the conference by talking about the different things different stakeholder in the publishing game want.

Researchers/Authors – mostly they want to be left alone to get on with their research. They want to get promoted and get tenure. They don’t want to follow rules.

Readers – want content to be free or cheap (or really expensive as long as something else is paying). Authors (who are readers) do care about the journals being cancelled if it is one they are published in. They want a nice clear easy interface because they are accessing research on different publisher’s webpages. They don’t think about ‘you get what you pay for.’

Institutions – don’t want to be in trouble with the regulators, want to look good in league tables, don’t want to get into arguments with faculty, don’t want to spend any money on this stuff.

Libraries – Hark back to the good old days. They wanted manageable journal subscriptions, wanted free stuff, expensive subscriptions that justified ERM. Now libraries are reaching out for new roles and asking should we be publishers, or taking over the Office of Research, or a repository or managing APCs?

Politicians – want free public access to publicly funded research. They love free stuff to give away (especially other people’s free stuff).

Funders – want to be confusing, want to be bossy or directive. They want to mandate the output medium and mandate copyright rules. They want possibly to become publishers. Mark noted there are some state controlled issues here.

Publishers – “want to give huge piles of cash to their shareholders and want to be evil” (a joke). Want to keep their business model – there is a conservatism in there. They like to be able to pay their staff. Publishers would like to realise their brand value, attract paying subscribers, and go on doing most of the things they do. They want to avoid Freemium. Publishers could be a platform or a mega journal. They should focus on articles and forget about issues and embrace continuous publishing. They need to manage versioning.

Reviewers – apparently want to do less copy editing, but this is a lot of what they do. Reviewers are conflicted. They want openness and anonymity, slick processes and flexibility, fast turnaround and lax timetables. Mark noted that while reviewers want credit or points or money or something, you would need to pay peer reviewers a lot for it to be worthwhile.

Conference organisers – want the debate to continue. They need publishers and suppliers to stay in business.

Published 18 February 2016
Written by Dr Danny Kingsley
Creative Commons License

In conversation with Wellcome Trust and CRUK

On Friday 22 January Cambridge University invited our two main charity funders to discuss their views on data management and sharing with Cambridge researchers. David Carr from the Wellcome Trust and Jamie Enoch from Cancer Research UK came to the University to talk to our researchers.

The related blog ‘Charities’ perspective on research data management and sharing‘ summarises the presentations Jamie and David gave. After this event, a group of researchers from the School of Biological Sciences and from the School of Clinical Medicine at the University of Cambridge were invited to ask questions about the Wellcome Trust data management and sharing policy and CRUK data sharing and preservation policy directly of David and Jamie.

This blog is a summary of the discussion, with questions thematically grouped. These questions will be added to the list of Frequently Asked Questions on the University’s Research Data Management Website.

In summary:

  • It is not recommended that researchers simply share a link and release the data when requested. Research data should be available, accessible and discoverable.
  • The first responsibility is to protect the study participants. The funders provide guidance documents on sharing of patient data. Ethics committees also provide advice and guidance on what data can be shared. In principle, patient data should be safeguarded, but this should not preclude sharing. There are models for managed access to data that allow personal/sensitive data to be shared for legitimate purposes in a safe and secure manner.
  • The funders do not want to prevent new collaborations. When sharing data they recommend data generators provide a statement in the description of the data that they are willing to collaborate
  • It is recognised that it is often appropriate for researchers to have a defined period of exclusive access to the data they generate, but this should be determined by disciplinary norms. Any exemptions or delays have to be justified on a case by case basis, ideally at the outset of the project.
  • The funders expect research data that supports publications to be made accessible and publications should have a clear statement explaining how to access the underlying research data.
  • However researchers need to decide what is useful to be shared considering the effort of preparing the data for deposit and of sharing the data. If nobody is going to use the data, sharing is not a good use of researcher’s time.
  • Discipline-specific data repositories, where these exist, are recommended preferentially over general purpose or institutional repositories
  • Biosharing is an excellent resource with references to discipline-specific metadata schemas.
  • Staff members whose role is to manage data is an eligible cost on a grant
  • There are no funds for sharing data from old projects, although there are exceptions on a case by case basis
  • The funders are considering monitoring data management plans but their current primary goal is to encourage people to think about data management and sharing from the very start of the project

Access to research data

Q: Are funders benefiting from the expertise of organisations such as UK Data Service when providing advice on data access? UK Data Service has been managing controlled access to research data for a long time and it would be advantageous to benefit from their expertise.

A: Yes, we are in discussion with the UK Data Service. We are also working with the UK Data Service to consider whether it might be appropriate for hosting data from other disciplines beyond social science. We also believe there is significant scope to share lessons and best practices for data sharing between the social and biomedical sciences.

Q: Could we just share research data only when asked for it?

A: This is not a recommended solution: research data should be available, accessible and discoverable. Data access controls and criteria for what needs to happen for the access to be granted have to be made clear in metadata description.

Q: I have patient data which has to be stored in a secure space. I always say in my data management plan that I cannot share my data. I would like to get ethical guidance which will explain to me how to share these data. It is very easy to say that data cannot be shared. I would like to share my data, but I would like to do it properly. With patient data it is extremely difficult, especially with genomics data, where there is a risk that patients can be identified.

A: Sharing of clinical data is not easy. Both Wellcome Trust and Cancer Research UK are helping to drive a great deal of work which is considering access and governance models through which sensitive patient data can be made available for research in a safe, secure and trusted manner. They provide guidance documents on sharing of patient data. Safety of patients and patients’ data is important. Ethics committees also provide advice and guidance on what data can be shared.

Q: What about sharing of physical materials? I have received a request to share a culture derived from a patient material, but the Ethics Committee did not approve sharing of this material. What shall I do?

A (Peter Hedges, Head of Research Office): If your ethical approval says that you cannot share that material, you cannot share it. Your first responsibility is to protect your study participants.

Q: If I share my data via a repository and people can simply download my data, I can no longer collaborate with them to work on the data and I have lost the possibility of getting credit for my data.

A: Nobody wants to prevent new collaborations from happening. A solution might be to add a statement that you are willing to collaborate in the description of your data. Your data requestor might be interested in collaborating, simply because you know your data the best. Funders also expect that the data re-used by others is appropriately acknowledged/cited, and they want to ensure that due credit results from the secondary use of data.

Quality control of research data

Q: If researchers start sharing unpublished research data via data repositories there is a risk that these data will not be of good quality as they will not be peer-reviewed.

A: Authors of unpublished data can simply state in the data description that the item was not peer-reviewed. If applicable, funders also encourage reciprocal links between publications and supporting research data.

What data needs to be shared and when?

Q: If researchers start to share everything there will be a lot of useless data available in data repositories. How to prevent a flood of useless data on the internet?

A: We would like researchers to decide what data is useful to be shared. If nobody is likely to use the data, sharing is not a good use of researcher’s time. Repositories also need to make decisions over what is worth keeping over time.

Comment (Peter Hedges, Head of Research Office): The Research Council UK focuses on research data supporting publications and this is what we recommend to researchers: share research data which underpins publications.

Q: Are we expected to share large datasets resulting from bigger projects (databases, long-term datasets) or data supporting individual publications?

A: We expect research data that supports individual publications to be made available with a hyperlink to the data. We also want researchers to consider and plan more broadly how they can make data assets of value resulting from our funded research available to others in a timely and appropriate manner.

Q: What about images? Is it useful to share them? It involves a lot of time to organise images. Besides, a single confocal picture with multiple layers is 1GB. In theory it is possible to share all raw data and all raw images, but who would want to look at them? 10 figures of 10 images is already 100 GB of data. Where would I store all these images, who is going to use these data and how am I going to pay for this?

A: The effort of preparing the data for deposit and of sharing the data should be proportionate to the potential benefits of data sharing. Researchers need to decide what is useful to be shared, following disciplinary best practices and norms (recognising that disciplines are in very different places in terms of defining these).

Q: Is there a set amount of time for exclusive use of research data?

A: Researchers should adhere to disciplinary norms. For example, in genomics research data is frequently shared before publication (sometimes under a publication moratorium which protects the data generator’s right to first publication). Any exemptions or delays have to be justified on a case by case basis.

Comment (Peter Hedges, Head of Research Office): Research is competitive. Sometimes it might be useful for researchers to know who wants to get the access to data and what do they need them for.

Cost of data sharing

Q: Can I ask in my grant for a staff member to help me with data management?

A: Yes, this is an eligible cost on grant applications: you can request a salary to support a research data manager for your research project, as long as it is justified.

Q: According to CRUK policy, costs for data sharing can be budgeted in grant applications only from August 2015. What about research data from older projects, when these costs were not eligible in grant applications? Is there any transition fund available to pay for this?

A: Unfortunately, there are no additional funds to pay for these costs. Researchers who have older datasets that might be of significant value to the community should contact CRUK – all requests for support will be considered on a case by case basis.

Q: Wellcome Trust encourages data sharing and data re-use, but does not allow for costs of long-term data preservation to be budgeted in grant applications. This does not make sense to me.

A: We are still reviewing our policy on costs of data management and sharing and we might be revisiting this issue – however, it is problematic for us to consider estimated costs for preservation that extend before the life-time of the grant. Our understanding is that costs of long-term data preservation are often less significant than costs of initial data ingestion by the repository (and we will cover ingestion costs).

Q: Who is then going to pay for the long-term data storage?

A: Wellcome Trust funds some discipline-specific repositories, but this is done jointly with other funders. We support bigger undertakings and we are also working with partners to develop platforms for data sharing and discoverability in some priority areas (notably clinical trials). Cancer Research UK pays for some long-term storage options, if these are justified for particular needs of the project. These decisions are made on a case by case basis, depending on how the costs are justified and whether these are directly related to the scientific value of the project.

Metadata standards

Q: At the moment there are many general purpose and institutional repositories, which are not well structured. To support efficient re-use of data it is important to use structured data repositories and adhere to metadata standards. What are funders’ opinions about this?

A: Wherever possible, discipline-specific data repositories should be used preferentially over general purpose or institutional repositories. Adherence to discipline-specific metadata standards is also encouraged. It has to be acknowledged that development of well-structured data repositories is very resource-intensive and not all disciplines have good quality repositories to support them. For example, it took over 30 years to adapt unified metadata standards at Cambridge Crystallographic Data Centre. The time need to properly solve problems should never be underestimated.

Q: Are funders planning to provide researchers with a list of recommended schemas for metadata?

A: Biosharing is an excellent resource with references to discipline-specific metadata schemas. It is a useful suggestion to include a reference to Biosharing on our website.

Policy implementation

Q: Are you planning to monitor researchers’ adherence to data management plans? For example, the BBSRC does not have the manpower to check all data management plans manually, but they are planning to create a system to check if data has been uploaded automatically.

A: We are considering this. At the moment we require data management plans with the primary goal to encourage people to think about data management and sharing from the very start of the project.

Published 5 February 2016
Written by Dr Marta Teperek, verified by David Carr and Jamie Enoch
Creative Commons License

Charities’ perspective on research data management and sharing

In 2015 the Cambridge Research Data Team organised several discussions between funders and researchers. In May 2015 we hosted Ben Ryan from EPSRC, which was followed by a discussion with Michael Ball from BBSRC in August. Now we have invited our two main charity funders to discuss their views on data management and sharing with Cambridge researchers.

David Carr from the Wellcome Trust and Jamie Enoch from Cancer Research UK (CRUK) met with our academics on Friday 22 January at the Gurdon Institute. The Gurdon Institute was founded jointly by the Wellcome Trust and CRUK to promote research in the areas of developmental biology and cancer biology, and to foster a collaborative environment for independent research groups with diverse but complementary interests.

This blog summarises the presentations and discusses the data sharing expectations from Wellcome Trust and CRUK. A second related blog ‘In conversation with Wellcome Trust and CRUK‘ summarises the question and answer session that was held with a group of researchers on the same day.

Wellcome Trust’s requirements for data management and sharing

Sharing research data is key for Wellcome’s goal of improving health

David Carr started his presentation explaining that the Wellcome Trust’s mission is to support research with the goal of improving health. Therefore, the Trust is committed to ensuring research outputs (including research data) can be accessed and used in ways that will maximise health and societal benefits. David reminded the audience of benefits of data sharing. Data which is shared has the potential to:

  • Enable validity and reproducibility of research findings to be assessed
  • Increase the visibility and use of research findings
  • Enable research outputs to be used to answer new questions
  • Reduce duplication and waste
  • Enable access to data to other key communities – public, policymakers, healthcare professionals etc.

Data sharing goes mainstream

David gave on overview of data sharing expectations from various angles. He started by referring to the Royal Society’s report from 2012: Science as an open enterprise, which sets sharing as the standard for doing science. He then also mentioned other initiatives like the G8 Science Ministers’ statement, the joint report from the Academy of Medical Sciences, BBSRC, MRC and Wellcome Trust on reproducibility and reliability of biomedical research and the UK Concordat on Open Research Data with a take-home message that sharing data and other research outputs is increasingly becoming a global expectation, and a core element of good research practice.

Wellcome Trust’s policy for open data

The next aspect of David’s presentation was Wellcome Trust’s policy on data management and sharing. The policy was first published almost a decade ago (2007) with subsequent modifications in 2010. The principle of the policy is simple: research data should be shared and preserved in a manner which maximises its value to advance research and improve health. Wellcome Trust also requires data management plans as a compulsory part of grant applications, where the proposed research is likely to generate a dataset that will have significant value to researchers and other users. This is to ensure that researchers understand the importance of data management and sharing and to plan for it from the start their projects.

Cost of data sharing

Planning for data management and sharing involves costing for these activities in the grant proposal. The Wellcome Trust’s FAQ guidance on data sharing policy says that: “The Trust considers that timely and appropriate data management and sharing should represent an integral component of the research process. Applicants may therefore include any costs associated with their proposed approach as part of their proposal.” David then outlined the types of costs that can be included in grant applications (including for dedicated staff, hardware and software, and data access costs). He noted that in the current draft guidance on costing for data management estimated costs for long-term preservation that extend beyond the lifetime of the grant are not eligible, although costs associated with the deposition of data in recognised data repositories can be requested.

Key priorities and emerging areas in data management and sharing

Infrastructure

The Wellcome Trust also identified key priorities and emerging areas where work needs to be done to better support of data management and sharing. The first one was to provide resources and platforms for data sharing and access. David pointed out that wherever available, discipline-specific data repositories are the best home for research data, as they provide rich metadata standards, community curation and better discoverability of datasets.

However, the sustainability of discipline-specific repositories is sometimes uncertain. Discipline-specific resources are often perceived as ‘free’. However, research data submitted to ‘free’ data repositories has to be stored somewhere and the amount of data produced and shared is growing exponentially – someone has to pay for the cost of storage and long-term curation in discipline-specific data repositories. An additional point for consideration is that many disciplines do not have their own repositories and therefore need to heavily rely on institutional support.

Access

Wellcome Trust funds a large number of projects in clinical areas. Dealing with patient data requires careful ethical considerations and planning from the very start of the project to ensure that data can be successfully shared at the end of the project. To support researchers in dealing with patient data The Expert Advisory Group on Data Access (a cross-funder advisory body established by MRC, ESRC, Cancer Research UK and the Wellcome Trust) has developed guidance documents and practice papers about handling of sensitive data: how to ask for informed consent, how to anonymise data and the procedures that need to be in place when granting access to data. David stressed that balance needs to be struck between maximising the use of data and the need to safeguard research participants.

Incentives for sharing

Finally, if sharing is to become the normal thing to do, researchers need incentives to do so. Wellcome Trust is keen to work with others to ensure that researchers who generate and share datasets of value receive appropriate recognition for their efforts. A recent report from the Expert Advisory Group on Data Access proposed several recommendations to incentivise data sharing, with specific roles for funders, research leaders, institutions and publishers. Additionally, in order to promote data re-use, the Wellcome Trust joined forces with the National Institutes of Health and the Howard Hughes Medical Institute and launched the Open Science Prize competition to encourage prototyping and development of services, tools or platforms that enable open content.

Cancer Research UK’s views on data sharing

The next talk was by Jamie Enoch from Cancer Research UK. Jamie started by saying that because Cancer Research UK (CRUK) is a charity funded by the public, it needs to ensure it makes the most of its funded research: sharing research data is elemental to this. Making the most of the data generated through CRUK grants could help accelerate progress towards the charity’s aim in its research strategy, to see three quarters of people surviving cancer by 2034. Jamie explained that his post – Research Funding Manager (Data) – has been created as a reflection of data sharing being increasingly important for CRUK.

The policy

Jamie started talking about the key principles of CRUK data sharing policy by presenting the main issues around research data sharing and explaining the CRUK’s position in relation to them:

  • What needs to be shared? All research data, including unpublished data, source code, databases etc, if it is feasible and safe to do so. CRUK is especially keen to ensure that data underpinning publications is made available for sharing.
  • Metadata: Researchers should adhere to community standards/minimum information guidelines where these exist.
  • Discoverability: Groups should be proactive in communicating the contents of their datasets and showcasing the data available for sharing

Jamie explained that CRUK really wants to increase the discoverability of data. For example, clinical trials units should ideally provide information on their websites about the data they generate and clear information about how it can be accessed.

  • Modes of sharing: Via community or generalist repositories, under the auspices of the PI or a combination of methods

Jamie explained that not all data can be/should be made openly available. Due to ethical considerations sometimes access to data will have to be restricted. Jamie explained that as long as restrictions are justified, it is entirely appropriate to use them. However, if access to data is restricted, the conditions on which access will be granted should be considered at the project outset, and these conditions will have to be clearly outlined in metadata descriptions to ensure fair governance of access.

  • Timeframes: Limited period of exclusive use permitted where justified

Jamie suggested adhering to community standards when thinking about any periods of exclusive use of generated research data. In some communities research data is made accessible at the time of publication. Other communities will expect data release at the time of generation (especially in collaborative genomics projects). Jamie further explained that particularly in cases where new data can affect policy development, it is key that research data is released as soon as possible.

  • Preservation: Data to be retained for at least 5 years after grant end
  • Acknowledgement: Secondary users of data should credit original researcher and CRUK
  • Costs: Appropriately justified costs can be included in grant proposals

As of late 2015, financial support for data management and sharing can be requested as a running cost in grant applications. Jamie explained that there are no particular guidelines in place explaining eligible and non-eligible costs and that the most important aspect is whether the costs are well justified or not, and reasonable in the context of the research envisaged.

Jamie stressed that the key point of the CRUK policy is to facilitate data sharing and to engage with the research community, recognising the challenges of data sharing for different projects and the need to work through these collaboratively, rather than enforce the policy in a top-down fashion.

Policy implementation

Subsequently, the presentation discussed ways in which CRUK policy is implemented. Jamie explained that the main tool for the policy implementation is the new requirement for data management plans as compulsory part of grant applications.

Two of the three main response mode committees: Science Committee and Clinical Research Committee have a two-step process of writing a data management plan. During the grant application stage researchers need to write a short, free-form description about how they plan to adhere to CRUK’s policy on data sharing. Only if the grant is accepted, the beneficiary will be asked to write a more detailed data management plan, in consultation with CRUK representatives.

This approach serves two purposes as it:

  • ensures that all applicants are aware of CRUK’s expectations on data sharing (they all need to write a short paragraph about data sharing)
  • saves researchers’ time: only those applicants who were successful will have to provide a detailed data management plan, and it allows the CRUK office to engage with successful applicants on data sharing challenges and opportunities

In contrast, applicants for the other main CRUK response mode committee, the Population Research Committee, all fill out a detailed data management and sharing plan at application stage because of the critical importance of sharing data from cohort and epidemiological studies.

Outlooks for the future

Similarly to the Wellcome Trust, CRUK realised that cultural change is needed for sharing to become the normality. CRUK have initiated many national and international partnerships to help the reward of data sharing.

One of them is a collaboration with the YODA (Yale Open Data Access) project aiming to develop metrics to monitor and evaluate data sharing. Other areas of collaborative work include collaboration with other funders on development of guidelines on ethics of data management and sharing, platforms for data preservation and discoverability, procedures for working with population and clinical data. Jamie stressed that the key thing for CRUK is to work closely with researchers and research managers – to understand the challenges and work through these collaboratively, and consider exciting new initiatives to move the data sharing field forwards.

Links

Published 5 February 2016
Written by Dr Marta Teperek, verified by David Carr and Jamie Enoch
Creative Commons License

2015 – that was the year that was

This time last year, the Office of Scholarly Communication at Cambridge University had been in existence for one week. As the inaugural Head of the Office, I had landed in the UK from Australia on 1 January, and was still battling jet lag. What a difference a year makes. This blog is a short run down of what has happened in 2015 and a brief peek into our plans for 2016.

The OSC has three primary foci – managing compliance with funders, external engagement and working with the Cambridge community to ensure awareness of broader scholarly communication issues. In our spare time we have also taken on a few projects.

Managing funder compliance

Open Access

The University of Cambridge is engaging its research community with open access with a broad approach, both offering solutions for compliance management and determining ways in which the community can continue their normative communication behaviours while increasing access to their research.

As with all universities in the UK, the Open Access service is managing multiple and conflicting open access policies in a complex publishing landscape. The RCUK open access policy has been in effect since April 2013, and the COAF policy continues the longstanding Wellcome Trust open access policy. In all the OSC manages annual funds from these of approximately £2 million to support open access compliance. HEFCE announced its upcoming open access REF policy in March 2014.

In October 2014 the University introduced a user experience evidence-based new system for compliance with the tag line “Accepted for publication? Send us your manuscript“. This is a system designed to ensure that the researcher only has to act once in order to comply with multiple policies. Researchers use an attractive and simple interface where they are asked to upload their manuscript, complete a short form and submit. Our OA team then check funder and publisher policies and deposit the work in the repository for HEFCE compliance and determine the payment options required and funds available for the article, using a decision tree. The team manage the article payment processes and contact the author once the work is complete. From the author perspective this is a simple and much liked system.

Outreach has included contacting departmental administrators, speaking to research communities, attending Committee meetings and so on to spread the word. Despite this, the team processes an average of 240 unique HECFE eligible papers per month, representing approximately 30% of research output.  While this may be cause for concern in relation to future REF compliance, a brief analysis of the open access publication activities of Cambridge researchers indicates that 60% of Cambridge research is being made available  – including through our system.

We continue to have challenges relating to publishers not making articles open access under the correct licence (or even at all) despite our payment of Article Processing Charges. The checking and chasing up of these publishers is extremely time consuming. In an attempt to ensure the publishers did what we were paying for we brought in Purchase Orders for the first half of the reporting period. This has caused serious issues when it came to reporting in terms of matching the articles listed in the Open Access systems against the financial systems of the University for reporting purposes to the RCUK. As it was not making any difference to publisher behaviour we abandoned this approach. The only issues we have encountered have been for articles that are hybrid – Cambridge University (across both the RCUK and COAF funds) spends approximately 74% on hybrid journals as opposed to fully OA journals.

There has been a constant reporting requirement throughout 2015, first to Jisc, then the RCUK, the Wellcome Trust and Jisc a second time. This has been a huge drain on personnel as none of the reporting periods align, requiring several months FTE equivalent’s worth of work. This is due to several issues, of which the Purchase Order problem mentioned above is a minor factor.  The large number of articles that are required to be reported on in detail on an individual basis is a complex task. 

Research Data Management

2015 has been a big year for Research Data Management, with the EPSRC announcing they would start checking to ensure researchers are making their underlying data available. The Research Data Facility has spent the year focused on increasing awareness, providing support and resources, and managing data with huge success. There have been face to face meetings with over 1300 researchers, and data submissions have risen exponentially (see here for a graphic of the numbers in July 2015). The team provides Research Data Management Plan support, and the data website has had over 16,500 visits.

We have spent a huge amount of time talking to the Cambridge research community. One outcome of these discussions is a deep understanding of the concerns and challenges for researchers in relation to data sharing. To address these we have provided fora for our researchers to meet with the funders to find solutions.  Our meetings with EPSRC and BBSRC resolved many concerns and resulted in an endorsed set of FAQs about research data sharing.

We have contributed to policy development by working with our contemporaries at many institutions to provide a coordinated response to the proposed UK Concordat on Research Data.

Systems management

A perennial issue with open access is the integration of systems within the institution to achieve the holy grail of ‘deposit once, use many times’. We are not there yet, although we have made good inroads. Cambridge University was one of the testbed institutions for DSpace, and the repository has been in place since 2005. The repository had suffered from a lack of attention and by the beginning of 2015 was not functioning properly and contained a large amount of bespoke coding.

The upgrade of DSpace from Version 3.4 to Version 4.3 took many months because it involved an associated standardisation of the base code to ensure future upgrades will be smooth. We also needed to create a new server platform for the repository to sit in which has stabilised our operations. The repository policy has been revisited and the agreements and licenses associated with minting DOIs are now in place, and the next step is to look at integration with other University systems.

We held a repository naming competition during the year, with the winning name being ‘Apollo’ – the god of logic.  The new name and logo will be launched when the repository interface is upgraded in early 2016. The repository now holds 13,269 articles and manuscripts, 359 datasets and 713 working papers. In total there are more than 200,000 items held in the repository – 175,429 of these are chemical structures.

Engagement and awareness

Within Cambridge

Cambridge University is a large and complex many-headed beast. Engaging this community is extremely challenging. The Office of Scholarly Communication runs a large number of electronic communication channels to ensure researchers are able to stay up to date and informed about open access and research data management, including the Research Data Management website, the Office of Scholarly Communication website and the Open Access website.

We send out monthly newsletters on Research Data Management to over 1000 subscribers, and at the end of 2015 launched a monthly Open Access newsletter – you can sign up here.  We use Twitter extensively (see @CamOpenData, @CamOpenAccess and @dannykay68). In addition the OSC has produced a series of advocacy materials to support their work.

But it is not all electronic – we have also have presented to over 1600 researchers and administrative staff during 2015 through events, presentations and workshops. Highlights have included workshops on software licensing,  an Open Access week joint event with Cambridge University Press addressing the question: ‘Can society afford open access?’ (see a video summary here), and an Open Data panel discussion ‘Open Data – moving science forward or a waste of money and time?‘. The video of this event is here.

More broadly

This Unlocking Research blog provides information and analysis on issues relating to Scholarly Communication, Open Access, Research Data Management and Library matters. The blog  is well used, with over 16,000 visits since launching.

The post with the greatest impact was Dutch boycott of Elsevier – a game changer? with over 3,500 visits in the first week before it was reblogged by the London School of Economics. [Late news added 22 Jan 2016: This blog was listed as one of the Top Ten Posts for 2015: Open Access. It was also listed as one of the blogs that had an average minute per page measurement of over 6 minutes and 30 seconds.]

Members of the OSC are increasingly being invited to speak at conferences both within the UK and beyond. Topics have included:

We are also active participants in the discussions held amongst our communities within and outside of the UK. There is a high level of cooperation amongst those working in the area of scholarly communication and open access. The OSC contributes to meetings and initiatives organised by the League of European Research UniversitiesSPARC Europe and the UK Council of Research Repositories amongst others.

Training and support

Supporting Researchers in the 21st century

The OSC launched the ‘Supporting Researchers in the 21st century’ programme – aimed at library and other administrative staff – with three introductory workshops held over six weeks from May to early July. 103 people attended. Working from feedback obtained at these events the programme began offering training and workshops from late July.

Topics covered to date include Research Data Management for Librarians, a Primer on Open Access, Information Security in a Research Environment, Introduction to Metrics and a Day in the Life of Researcher and Meet an Open Access publisher. In addition there have been several opportunities to hear from visiting international experts including:

Research Support Ambassadors

The Research Support Ambassador programme began as an idea of a ‘crack team’ of people who could be deployed across the University to present workshops on Scholarly Communication issues. The general philosophy was that this was a way to encourage staff across the library community and across the grade range to step up.

We have had 18 brave souls volunteer to be the first group in what has frankly been a rather ‘organic’ process given we had no idea how this was going to play out.  The reasons members of the group gave for participating included the opportunity to learn more and gain skills, be able to support researchers better and several people wanted more face to face interactions. We ran two sets of intensive training sessions where we decided to focus on four areas:

  • Researcher Support in Cambridge
  • Managing your online presence
  • Making your thesis open access
  • The Research Lifecycle

We have taken a constructivist approach to learning – where learners take charge of their own learning. The group has worked with a mixture of self education and team work to try and develop ‘modular’ outputs that can be presented by others. There is a blog listing the progress on these topics to date here.

There have been significant challenges to the process with a mixture of new material and technologies, working in teams with new colleagues and limited time. In addition they have had to self direct as the recruitment process for an Research Skills Coordinator took eight months. To the Ambassador’s credit they have stuck through a confusing process with very little direction. There is a blog post on an insider’s view of the programme here.

Other projects

Unlocking Theses project

This project is the first step to dramatically increase the number of open access theses in the repository, which stood at about 600 at the beginning of 2015. On average one in ten PhD students deposit their thesis to make it available. The repository currently does not allow any other type of thesis to be deposited.

This system has meant that when a researcher requests a copy of a thesis for research purposes, the bound version needs to be scanned. In 2015 the Library held over 1200 scanned theses on an internal server. The Unlocking Theses project added all of these scanned theses held by the Library into the University repository, Apollo which now holds 2176 theses, of which 1,021 are openly accessible. The Development and Alumni Office were able to provide contact details for just over 600 of these authors. The majority of these authors have now been contacted and we have had a 35% positive response rate from them. We are in the final process of opening these theses. The remaining 1155 theses are currently held in a Restricted Theses Collection but the biographical information about these theses is searchable.

Managing Cambridge Journals project

Cambridge University Libraries are interested in supporting new forms of open access publishing.  In 2015 a search revealed that at least seven research and 13 student self-published journals and magazines currently circulate within the Cambridge community. These range widely in quality from almost professional publications to literally photocopied pages. The Managing Cambridge Journals project is working with Cambridge University Press to offer support to Cambridge researchers who are publishing outside of the traditional channels.  Three areas of potential support have been identified – a publishing platform, information and support and possibly an internal Cambridge publishing ‘brand’.  Work is already underway to ingest the full decade of articles published in the Cambridge Journal of China Studies into the repository from their currently unstable home on a website.

The team

Screen Shot 2016-01-11 at 15.56.08To achieve all of this has required a huge effort on many people’s behalf. In January 2015 the OSC had three staff plus the Head – two Open Access Research Advisors and a part-time Repository Manager. Now the team sits at 12 people and this number is relatively fluid.

This sounds like a huge group – which it is. But with only two exceptions – of which the Head is one – all staff are either temporary staff or on extremely short term contracts. This is primarily related to (a lack of) funding and has two effects. First, a disproportionate amount of time is spent on managing recruitment, writing job descriptions, advertising, interviewing and so on. Almost all HR requirements are still enforced regardless of the brevity of contracts – including monthly probation interviews.

The second effect is the constant need to lobby for financial support which requires creating business cases, new organisational charts and many, many meetings. The Library has been nothing but supportive throughout this process, but there is a need for the broader institution to recognise that much of the work done in the OSC falls in the University rather than Library camp.

Looking forward to 2016

This upcoming year is shaping up to be as busy and productive as the first year of operation. Some of the planned activities include:

  • Negotiation with Research Council UK funders on possible funding options for the Research Data Facility.
  • The Communication across the Research Lifecycle project aims to join up communication with researchers by Cambridge administrative departments. This requires scoping the current communication channels and developing advocacy materials across the University administrative departments. There is currently no financial support for this project.
  • Participating in the JISC Shared Research Data Management Shared Services pilot
  • Increase the collaboration with Cambridge University Press on the Managing Cambridge Journals project to develop this project to operational level.
  • The second tranche of upgrades to DSpace are underway. This will involve an upgrade to V5 and implement ‘request a copy’ buttons, minting DOIs, registering the repository to wider aggregation systems and updating the look and feel of the interface. This work is expected to be completed by Easter 2016.
  • A Repository Integration Manager will start work on the interoperability of DSpace with Symplectic and other systems in the University. New forms and simple deposit processes will be developed.
  • Increase theses deposit by developing a new form, and amendment to the policy to allow all theses types to be deposited.
  • Pilot with selected departments to require the deposit of a digital thesis at the same time as the printed and bound version, with the option of making the work available.
  • Complete the first round of the Research Support Ambassador programme with some skills training and finalisation of training products before the group is released into the wild.
  • Negotiate with arXiv and other open access providers to allow researchers to meet funder requirements within their usual communication norms.
  • Develop a comprehensive Research Data Management training program for PhD students.
  • Build on the Supporting Researchers in the 21st century programme.
  • Present at conferences in the UK and abroad.

So, watch this space!

Published 11 January 2016
Written by Dr Danny Kingsley
Creative Commons License

Open Data – moving science forward or a waste of money & time?

On the 4 November the Research Data Facility at Cambridge University invited some inspirational leaders in the area of research data management and asked them to address the question: “is open data moving science forward or a waste of money & time?”. Below are Dr Marta Teperek’s impressions from the event.

Great discussion

Want to initiate a thought-provoking discussion on a controversial subject? The recipe is simple: invite inspirational leaders, bright people with curious minds and have an excellent chair. The outcome is guaranteed.

We asked some truly inspirational leaders in data management and sharing to come to Cambridge to talk to the community about the pros and cons of data sharing. We were honoured to have with us:

  • PRE_IntroSlide_V3_20151123Rafael Carazo-Salas, Group Leader, Department of Genetics, University of Cambridge
    @RafaCarazoSalas
  • Sarah Jones, Senior Institutional Support Officer from the Digital Curation Centre; @sjDCC
  • Frances Rawle, Head of Corporate Governance and Policy, Medical Research Council; @The_MRC
  • Tim Smith, Group Leader, Collaboration and Information Services, CERN/Zenodo; @TimSmithCH
  • Peter Murray-Rust, Molecular Informatics, Dept. of Chemistry, University of Cambridge, ContentMine; @petermurrayrust

The discussion was chaired by Dr Danny Kingsley, the Head of Scholarly Communication at the University of Cambridge (@dannykay68).

What is the definition of Open Data?

IMG_PMRWithText_V1_20151126The discussion started off with a request for a definition of what “open” meant. Both Peter and Sarah explained that ‘open’ in science was not simply a piece of paper saying ‘this is open’. Peter said that ‘open’ meant free to use, free to re-use, and free to re-distribute without permission. Open data needs to be usable, it needs to be described, and to be interpretable. Finally, if data is not discoverable, it is of no use to anyone. Sarah added that sharing is about making data useful. Making it useful also involves the use of open formats, and implies describing the data. Context is necessary for the data to be of any value to others.

What are the benefits of Open Data?

IMG_RCSWithText_V1_20151126Next came a quick question from Danny: “What are the benefits of Open Data”? followed by an immediate riposte from Rafael: “What aren’t the benefits of Open Data?”. Rafael explained that open data led to transparency in research, re-usability of data, benchmarking, integration, new discoveries and, most importantly, sharing data kept it alive. If data was not shared and instead simply kept on the computer’s hard drive, no one would remember it months after the initial publication. Sharing is the only way in which data can be used, cited, and built upon years after the publication. Frances added that research data originating from publicly funded research was funded by tax payers. Therefore, the value of research data should be maximised. Data sharing is important for research integrity and reproducibility and for ensuring better quality of science. Sarah said that the biggest benefit of sharing data was the wealth of re-uses of research data, which often could not be imagined at the time of creation.

Finally, Tim concluded that sharing of research is what made the wheels of science turn. He inspired further discussions by strong statements: “Sharing is not an if, it is a must – science is about sharing, science is about collectively coming to truths that you can then build on. If you don’t share enough information so that people can validate and build up on your findings, then it basically isn’t science – it’s just beliefs and opinions.”

IMG_TSWithText_V1_20151126Tim also stressed that if open science became institutionalised, and mandated through policies and rules, it would take a very long time before individual researchers would fully embrace it and start sharing their research as the default position.

I personally strongly agree with Tim’s statement. Mandating sharing without providing the support for it will lead to a perception that sharing is yet another administrative burden, and researchers will adopt the ‘minimal compliance’ approach towards sharing. We often observe this attitude amongst EPSRC-funded researchers (EPSRC is one of the UK funders with the strictest policy for sharing of research data). Instead, institutions should provide infrastructure, services, support and encouragement for sharing.

Big data

Data sharing is not without problems. One of the biggest issues nowadays it the problem of sharing of big data. Rafael stressed that with big data, it was extremely expensive not only to share, but even to store the data long-term. He stated that the biggest bottleneck in progress was to bridge the gap between the capacity to generate the data, and the capacity to make it useful. Tim admitted that sharing of big data was indeed difficult at the moment, but that the need would certainly drive innovation. He recalled that in the past people did not think that one day it would be possible just to stream videos instead of buying DVDs. Nowadays technologies exist which allow millions of people to watch the webcast of a live match at the same time – the need developed the tools. More and more people are looking at new ways of chunking and parallelisation of data downloads. Additionally, there is a change in the way in which the analysis is done – more and more of it is done remotely on central servers, and this eliminates the technical barriers of access to data.

Personal/sensitive data

IMG_FRWithText_V1_20151126Frances mentioned that in the case of personal and sensitive data, sharing was not as simple as in basic sciences disciplines. Especially in medical research, it often required provision of controlled access to data. It was not only important who would get the data, but also what they would do with it. Frances agreed with Tim that perhaps what was needed is a paradigm shift – that questions should be sent to the data, and not the data sent to the questions.

Shades of grey: in-between “open” and “closed”

Both the audience and the panellists agreed that almost no data was completely “open” and almost no data was completely “shut”. Tim explained that anything that gets research data off the laptop to a shared environment, even if it was shared only with a certain group, was already a massive step forward. Tim said: “Open Data does not mean immediately open to the entire world – anything that makes it off from where it is now is an important step forward and people should not be discouraged from doing so, just because it does not tick all the other checkboxes.” And this is yet another point where I personally agreed with Tim that institutionalising data sharing and policing the process is not the way forward. To the contrary, researchers should be encouraged to make small steps at a time, with the hope that the collective move forward will help achieving a cultural change embraced by the community.

Open Data and the future of publishing

Another interesting topic of the discussion was the future of publishing. Rafael started explaining that the way traditional publishing works had to change, as data was not two-dimensional anymore and in the digital era it could no longer be shared on a piece of paper. Ideally, researchers should be allowed to continue re-analysing data underpinning figures in publications. Research data underpinning figures should be clickable, re-formattable and interoperable – alive.

IMG_DKWithText_V1_20151126Danny mentioned that the traditional way of rewarding researchers was based on publishing and on journal impact factors. She asked whether publishing data could help to start rewarding the process of generating data and making it available. Sarah suggested that rather than having the formal peer review of data, it would be better to have an evaluation structure based on the re-use of data – for example, valuing data which was downloadable, well-labelled, re-usable.

Incentives for sharing research data

IMG_SJWithText_V1_20151126The final discussion was around incentives for data sharing. Sarah was the first one to suggest that the most persuasive incentive for data sharing is seeing the data being re-used and getting credit for it. She also stated that there was also an important role for funders and institutions to incentivise data sharing. If funders/institutions wished to mandate sharing, they also needed to reward it. Funders could do so when assessing grant proposals; institutions could do it when looking at academic promotions.

Conclusions and outlooks on the future

This was an extremely thought-provoking and well-coordinated discussion. And maybe due to the fact that many of the questions asked remained unanswered, both the panellists and the attendees enjoyed a long networking session with wine and nibbles after the discussion.

From my personal perspective, as an ex-researcher in life sciences, the greatest benefit of open data is the potential to drive a cultural change in academia. The current academic career progression is almost solely based on the impact factor of publications. The ‘prestige’ of your publications determines whether you will get funding, whether you will get a position, whether you will be able to continue your career as a researcher. This, connected with a frequently broken peer-review process, leads to a lot of frustration among researchers. What if you are not from the world’s top university or from a famous research group? Will you be able to still publish your work in a high impact factor journal? What if somebody scooped you when you were about to publish results of your five years’ long study? Will you be able to find a new position? As Danny suggested during the discussion, if researchers start publishing their data in the ‘open”’ there is a chance that the whole process of doing valuable research, making it useful and available to others will be rewarded and recognised. This fits well with Sarah’s ideas about evaluation structure based on the re-use of research data. In fact, more and more researchers go to the ‘open’ and use blog posts and social media to talk about their research and to discuss the work of their peers. With the use of persistent links research data can be now easily cited, and impact can be built directly on data citation and re-use, but one could also imagine some sort of badges for sharing good research data, awarded directly by the users. Perhaps in 10 or 20 years’ time the whole evaluation process will be done online, directly by peers, and researchers will be valued for their true contributions to science.

And perhaps the most important message for me, this time as a person who supports research data management services at the University of Cambridge, is to help researchers to really embrace the open data agenda. At the moment, open data is too frequently perceived as a burden, which, as Tim suggested, is most likely due to imposed policies and institutionalisation of the agenda. Instead of a stick, which results in the minimal compliance attitude, researchers need to see the opportunities and benefits of open data to sign up for the agenda. Therefore, the Institution needs to provide support services to make data sharing easy, but it is the community itself that needs to drive the change to “open”. And the community needs to be willing and convinced to do so.

Further resources

  • Click here to see the full recording of the Open Data Panel Discussion.
  • And here you can find a storified version of the event prepared by Kennedy Ikpe from the Open Data Team.

Thank you

We also wanted to express a special ‘thank you’ note to Dan Crane from the Library at the Department of Engineering, who helped us with all the logistics for the event and who made it happen.

Published 27 November 2015
Written by Dr Marta Teperek
Creative Commons License

A Day in the Life of an Open Access Research Adviser

As part of the Office of Scholarly Communication Open Access Week celebrations, we are uploading a blog a day written by members of the team. Monday is a piece by Dr Philip Boyes reflecting on the variety of challenges of working in the Open Access team.

As anyone working in it knows all too well, Open Access can be a complicated field, with multiple policies from funders, institutions and publishers which can be complex, sometimes obscure and sometimes mutually contradictory. While we’re keen to raise awareness of and engagement with Open Access issues, the University of Cambridge’s view is that expecting academics to get to grips with all this themselves would represent an unreasonable demand on their time and likely lead to errors and resentment.

Instead, Cambridge’s policy is that authors should simply send us their Accepted Manuscript at acceptance through our simple upload system and our team of Research Advisers will check out exactly what they need to do to comply with all the relevant funder and journal policies and get back to them with individually-tailored advice. The same system also allows us to take care of deposit into the repository for HEFCE and to manage payments from the block grants we’ve received from the UK Research Councils (RCUK) and the Charities Open Access Fund (COAF – seven biomedical charities, including the Wellcome Trust).

The idea is that from the academic’s point of view the process feels smooth and seamless. But the reality is that very little of the process is automated. Behind the scenes there’s a lot of (thankfully metaphorical) running around by our team of three Open Access Research Advisers to provide this service, as well as working on broader issues of communication, processing APCs and improving our systems.

So what does a Cambridge Open Access Research Adviser do all day? Here’s a typical day in the life…

8.45am- Getting started

Arriving in the office, I check my emails and look at the Open Access Helpdesk. Overnight we’ve received around 15 new tickets, as well as some further correspondence on existing ones. Fairly typical. It’s split between manuscript uploads that need advice, general queries and invoicing correspondence from publishers. I start working through these on a first-come-first served basis.

They’re a real mixed bag. If a submitted article is straightforward we can deal with it in a few minutes – we check the journal site for their green and gold options and then advise the author on which is appropriate in each case. We also flag the manuscript for deposit into our repository – at the moment that’s a manual process and is mostly handled by temps.

Today things aren’t straightforward. A lot of the submissions are conference proceedings and there’s very little information on the conference websites. It’s not even clear whether some of these are being formally published (does private distribution on memory stick count? Do they have ISBNs or ISSNs?) It’s going to be a slow morning of chasing up authors and conference organisers for any information they have.

 10.00am – Complexity

I’m more or less through the conference proceedings, but we’re not through with complex cases. One of the invoices we’ve received is for an article we’ve not heard about before. It’s from a senior professor but he’s never submitted it to the open access service so we weren’t able to advise him on policy or eligibility for block grant funds. He selected the gold option for a Wellcome-funded correspondence article and now wants us to pay the $5000 + VAT bill. The trouble is, letters aren’t covered by the Wellcome policy so technically it isn’t eligible. I contact the author and break the news that he might have to pay this large bill himself and that this is why we like people to contact us first.

 11.00am – Clarity

The professor has got back to us. Although the journal’s classed it as a letter, the paper’s actually a very short research article, he says. I decide to contact Wellcome for guidance and let them decide whether they want this to be paid for from the COAF block grant.

 11:30am – Deja-vu

For the moment the backlog on the helpdesk has been cleared and our temps are busy adding manuscripts to the repository and updating previously-added articles with citation details and embargo end-dates. I have a bit of free time to move on to something else so begin to tackle the stack of publisher APC invoices that need processing.

They’re mostly correct, but some publishers and invoicing companies are better than others. Inevitably there are a few errors that need chasing up or publishers who have invoiced us repeatedly for the same thing. Among the stack is an overdue notice from a major publisher for a familiar article. It’s one we’ve repeatedly confirmed was paid fully almost two years ago but every few months ever since the publisher has told us it’s outstanding. I send them back the payment reference and details yet again and ask them to mark the issue as resolved. I somehow suspect we’ll be seeing it again.

 2.00pm – Presentation

Today offers a welcome opportunity to get out of the office. We’re holding a joint Open Access/Open Data presentation to researchers in one of the University’s departments to try and increase awareness of the policies. Our stats show that this department has particularly low engagement with the Open Access service so we’re keen work out why. It’s a fractious crowd. One or two people are keen Open Access advocates and speak up to say how simple the system is, but some others are vocal about their view that it’s an unwarranted burden and tell us they don’t see why they should bother.

We try to explain the benefits and funder mandates, as well as how we’ve tried to make the system as simple as possible. When we get back to the office we find that one of those present has sent us their back-catalogue of thirty articles stretching back to 2007 to put into the repository.

 4.00 – Compliance

While my colleagues work on the helpdesk I need to turn my attention to compliance and reporting. All too often when we’ve paid an APC the publisher hasn’t delivered Open Access with the correct licence, or in some cases at all. I generally try to do a weekly check of the articles for which we’d paid APCs to see whether they’ve been published correctly but it’s time-consuming and things have been busy lately. It’s been around three weeks since the last check so it really needs doing.

But the deadline is also fast approaching for annual reports to RCUK and COAF. These are both large and complex, and cover slightly different periods (and different again from the Jisc report a couple of months ago). It’s proving a major challenge to get the information together from our various systems and to match it to the relevant figures from the University Finance System. I decide to let the compliance checking wait a bit longer and work on trying to move things along on the reports. I make a bit of progress, but there’s still a huge amount left to do – information on thousands of articles that needs to be manually collated. With luck in the future we’ll have integrated systems that can do much of this automatically, but for now each report represents weeks of work.

Wrap up

There is, then, a huge variety and amount of work that goes into the Open Access service. The Helpdesk and the reporting alone would be more than enough to keep us busy, but we also have to make time for outreach and communications, managing the finances, improving our systems and more. We’re finding that as our team grows, we’re starting to specialise more into particular areas, but we’re still basically all generalists, working on all areas of the job. This balance between specialisation for the purposes of efficiency and the need for individuals to be able to move effectively from one task to another – not least to keep our jobs interesting and varied – is one that’s likely to become ever more challenging as the volume of articles we handle increases.

Published 19 October 2015
Written by Dr Philip Boyes
Creative Commons License

In conversation with Michael Ball from BBSRC

The Biotechnical and Biological Sciences Research Council (BBSRC) Data Sharing Policy states that research data that supports publications must be stored for 10 years and adherence to data management plans will be monitored and built into the Final Report score, which may be taken into account for future proposals.

Recently Michael Ball, the Strategy and Policy Manager at BBSRC accepted an invitation to Cambridge University to discuss the BBSRC policy on opening up access to data. Senior members of the University, the School of Biological Sciences, the Research Office and the Office of Scholarly Communications attended. These notes have been verified by Michael as an accurate reflection of the discussion.

The take home messages from the meeting were the importance of:

  • Disciplines themselves establishing ways of dealing with data
  • Thinking about how to deal with data from the beginning of a research project

The meeting began with a discussion about the support we provide Cambridge University researchers through the Research Data Service , the resources provided on the data website and the enthusiastic uptake of the service since the beginning of the year.

The conversation then moved into issues around the policy, focusing on several aspects – clarification of what needs to be shared, how this will be supported financially, questions about auditing, a discussion about the best place to keep the data and issues with data sharing in the biological sciences.

What data are we expected to share?

What is ‘supporting data’ in the biological sciences?

One of the biggest concerns biological researchers have about data sharing is what is meant by ‘data’. Biology has the most diverse group of data, which makes it hard to talk about biology because the issues are project and problem specific.

Michael confirmed the policy broadly refers to all data ‘but the devil is in the detail, there are lots of caveats’.  He echoed Ben Ryan in answer to a similar question of the EPSRC policy by saying the key points are:

  • What would you expect to see?
  • What do you think is important?

The interpretation of the BBSRC policy depends heavily on the types of data being produced.  Much is dependent on the expected norms, what a researcher would expect to see if they were trying to interpret the paper. What are the underlying supporting data for the paper?

The biological sciences throw up a particular challenge in the range and disparity in disciplinary norms. For example a great deal of data arises from genomics and some time ago they made the decision to share, including making decisions about what to share and what not to share. However, there are vast areas of experimental science where the paper itself is data.

The policy is going one step further back from the published paper towards the lab. In the future these data policies might go further back, if there was greater automation of the research process.

Michael confirmed that if the BBSRC has funded a PhD student they would expect them to make supporting data available.

What do we need to share in the Biological Sciences?

There is no expectation to share lab books unless they are the only place the data exists. Michael noted that when the BBSRC wrote the policy it excluded lab books and organisms.

However there is an expectation to share instrumental output. This is with the caveat that if it is output from an instrument that goes through some sort of amendment then you don’t need to share the original.

An example: A researcher is counting bacteria on a plate and scrupulously making notes in lab books before entering this information put into a computer spreadsheet to crunch the numbers. The expectation would be to share the spreadsheet not the lab book.

Some research requires the construction of a piece of technology where there might not be a great deal of associated data around it. In these instances it is the process of construction or the protocol or the methodology that is important to share.

Michael noted that in some disciplines, given the materials and input parameters and the same instruments, the output data will be the same each time. In these circumstances it is most sensible to share or describe the inputs and repeat the experiments. The question is about what would be the most useful to share.

Show me the money

A stitch in time

Michael confirmed that researchers can ask for the money they need (and can justify) for research data management in grant applications. He did say however that the BBSCR does not ‘generally see a lot of these requests’. He noted that this is because often people haven’t thought about the data they will generate at the start of the project. One of the researchers pointed out it was difficult to know how to fund it because ‘we are not sure what we need’. However, this should not be a reason to ask for nothing.

It may be that some of the discipline specific repositories will have to change their business models in the future to cope with larger data sets.

Michael said that it is worth thinking about data sharing at the project planning stage because different types of data have different requirements. Researchers might need to allow for the cost of getting the data in the right format and metadata. It is advisable to think about where the data will be published so the research team can prepare the data in the first instance.

Michael said that the data management plan should hopefully prompt how much data a research project will produce. It is advisable to consider the maximum amount of data the project may produce. The ideal situation will be to have an ongoing data management plan because in some ways it is useful at the end.

Longer term financial support

Raised in the meeting was the option of charging a flat fee up front regardless of the data being generated. The question arose about whether there was any danger in auditing with this approach? The problem with an up front fee is it becomes more difficult to track and output from a specific grant against what we put into the database. There is a directly incurred and directly allocated component to the cost.

Michael confirmed that any money allocated to data management won’t survive past the end of the grant. He noted this was something that he was ‘not sure how to unpick’. This raises the issue of the cost of longer term data sharing. The BBSRC provides funding to a certain point in time. There can be a secondary experiment funded by someone else and the works are published together. But the researcher can only share the data from the funded part. The BBSRC does not ask researchers to share data that they haven’t funded.

Auditing questions

Who is in charge here?

The academics raised the concern that there could be ‘mission creep’ where the funders expect people to do things that are a waste of time. They mentioned that an ideal situation would be where the research community decide what they want to share and what they don’t wish to share.

Michael noted that the BBSRC has to be guided by the community on their own community norms for data sharing, and this is why aspects of the data sharing policy is quite open. He noted that this meeting represented the first part of the process – where the funder comes together with communities to decide what is essential.

In addition, many journals are now requiring open data. It is the funders, the researchers and the journals who are asking for it. To some extent the BBSRC policy is guided by what the journals are asking for.

The policing process

The group expressed interest in how the BBSRC policy is policed and what would be the focus of that policing. Michael stated that BBSRC are investigating options of how to monitor compliance, but that it does not currently appear feasible to to check all of the submissions. BBSRC will monitor compliance, but will probably start with dipstick testing. They will look at historical projects and see where the process goes from there. In practice, this is likely to initially involve examining the degree of adherence to the submitted data management plans. If a researcher has acted reasonably and justified their mechanisms of data sharing, then it is unlikely that there would be any actions beyond noting where  difficulties had occurred.

Note, however that if a researcher has submitted a grant application with a data sharing statement there is a reasonable expectation to share the data.

Ultimately the data release will be policed. In areas where data sharing is prevalent, communities police themselves because researchers ask and expect the data to be available. In some cases you can’t publish without an accession number.

Michael noted there are places researchers can put information about published data into ResearchFish. ResearchFish is currently the only mechanism to capture information regarding post-award activities.

Where do we put the data?

The question arose about how other universities are managing the policy. Michael responded that many have started institutional repositories. The institutional response depends on where the majority of their research sits.

A possible solution for ensuring the data is discoverable would be a catalogue of what is stored in an institutional repository, with metadata about the data. That metadata would itself need to be discoverable. If the data is being held in a centralised repository it is possible to pay the cost upfront before the end of the grant.

The group noted there was a publishing preference for discipline specific repositories over institutional repositories because the community knows how to look after the work. These repositories are hosted by ‘people who know what they are doing’. They are discoverable, where the community can decide on the metadata and the required standards.

Michael agreed that the ideal was open discoverability. The question is what will be practically possible.

A way of considering the question is asking how would another researcher find the information? If the data is available from a researcher by request this should be noted in the paper. If it is available in a repository then the paper should state that. If the journal has told readers where the data is, then it should be self-evident.

Issues with obsolescence

Michael noted that there is an ongoing issue of obsolete data formats and disks. Given there are ideals and reality, it becomes a question of how to store and handle the information.

When data exists in a proprietary format, the researcher needs to think about how to access it in the longer term. What if the organisation goes out of business? Or the technology upgrades so you can’t get hold of the data in an earlier format? If data exists in a physical format then it is possible to go back and read it. However, if not then it is quite important to think about issues relating to long-term access. Lots of data will be obsolete.

There are some solutions for this issue. The Open Microscopy Environment is a joint project between universities, research establishments, industry and the software development community. It develops open-source software and data format standards for the storage and manipulation of biological microscopy data. This is a community-generated solution as a recognised problem. It has a database that you can upload any file format.

Issues with data sharing in the biological sciences

The BBSRC allows a reasonable embargo until the researcher has exploited the data for publication. If the researcher is planning on releasing further publications then they should consider carefully when to release the data., Michael noted, this is ‘not a forever thing’. The BBSRC do say there are reasonable limits, and some journals will expect data to be released alongside publications.

Commercial partners

Data emerging from BBSRC funded research needs to be shared unless there is a reason why not – and commercial partners who need to protect their intellectual property can be a good reason to delay data sharing. However once the Intellectual Property is protected, it is protected. The BBSRC allows researchers to embargo the data.

Michael also noted there are things that can be done with data, for example releasing it under license. An example is, if a researcher is working with a commercial partner who is concerned about other commercial competitors, it would be possible to require people to sign non-disclosure agreements. There are ways to deal with commercial data, as you would with other intellectual products.

It was noted by the researchers in the meeting that this type of arrangement is likely to mean the company doesn’t want to go through the process and won’t collaborate.

Exceptions

If data was generated before the policy was in place then the researcher has not submitted a grant application that requires them to share their data. The BBSRC is not expecting people to go back into history. Those researchers who wish to share historical research are not discouraged but this is not covered by the policy. The policy came into force in April 2007, however realistically it started in 2008.

In addition there are reasonable grounds for not sharing clearly incorrect or poor quality data. Many disciplinary databases will contain an element of quality control.   But Michael noted that the policy shouldn’t be a way for people to filter out inconvenient data and would expect the community to be self policing.

Future policy direction

Michael noted that this type of policy is becoming more prevalent not less. Open science is one of the Horizon 2020 themes – see the 2013 Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. Journals are getting involved as well. In the future sharing data will be more common – and driven by disciplinary norms. Anything that has been funded by RCUK will be required to share. It makes sense to government – the US National Institutes of Health and National Science Foundation have data sharing statements.

Continuing the dialogue

Michael indicated that he wants to talk to people about what the questions are so the BBSRC can refine issues in the policy.

Researchers who have questions about the policy can send them through to the Research Data Service team info@data.cam.ac.uk. If we are unable to answer them, we can ask BBSRC directly for clarification. We will then add the information to the University Research Data Management FAQ webpage.

Published 19 October 2015
Written by Dr Danny Kingsley, verified by Michael Ball, BBSRC
Creative Commons License

Data sharing – build it and they will come

If a tree falls in the forest and no one was there to hear it, did it happen? You could ask the same philosophical question of research – if no-one can see the research results, what was the point in the first place?

Moving science forward and increasing the knowledge of the world around implies exchange of findings. Society cannot benefit from research if there is no awareness of what has been done. Managing and sharing research data is a fundamentally important part of the research process. Yet researchers are often reluctant to share their data, and some are openly hostile to the idea.

This blog describes the research data services provided at Cambridge University which are attempting to encourage and assist researchers manage and share their data.

A tough start

The Data Management Facility project at Cambridge began operations in January 2015. At the time there was very little user support for data management in place.  There was no advocacy, no training and no centralised tools to support researchers in research data management.

There had been a substantial body of work undertaken in 2010-2012 as part of the ‘Incremental’ project into research data management, but once the project money ended, the resources remained available but were not updated.

One of the initial challenges was an out of date institutional repository. Cambridge University was one of the original test-bed institutions for DSpace in 2005. While there had been considerable effort invested in the establishment of the repository, it had in recent years been somewhat neglected. The lack of both awareness of the repository and support for researchers was reflected in the numbers: during the first decade of the repository, only 72 datasets had been deposited.

In addition, the Engineering and Physical Sciences Research Council (EPSRC) had compliance expectations for funded research kicking in May 2015. This gave us five months to pull the Research Data Facility together. It was a tough start.

Understanding researchers’ needs

Tight deadlines often mean the temptation is to create short-term solutions. But we did not want to take this path. Solutions created without prior understanding of the need have no guarantee they will resolve the actual issues at hand.

So we started talking with researchers. We met and spoke with hundreds of researchers across all disciplines and fields of study – Principal Investigators, postdocs, students, and staff members. These were both group sessions and individual meetings. We told them about the importance of sharing research data, and in return we listened to what researchers told us about their worries and possible problems with data sharing.

To date, we have spoken with over 1000 researchers, and from each meeting we kept detailed notes of all the questions/comments received.

We have additionally conducted a questionnaire to better understand researchers’ needs for research data management support. Of the researchers surveyed, 83% indicated that it is ‘very useful’ for the University to provided both information about funders’ expectations for research data sharing and management, and support.

Screen Shot 2015-08-24 at 06.45.55

Solution 1 – Providing information

In March 2015 we launched the Research Data Management website which is a single location for solutions to all research data management needs. The website contains:

and much more.

The key idea behind the website is to provide an easy to navigate place with all necessary information. The website is being constantly updated, and new information is regularly added in response to feedback received from researchers.

Concurrently we have been conducting tailored information sessions about funders’ requirements for sharing data and support available at the University of Cambridge. We run these sessions at multiple locations across the University, and to audiences of various types. The sessions ranged from open sessions in central locations to dedicated sessions hosted at individual departments, and speaking with individual research groups. Slides from information sessions are always made available for attendees to download.

Solution 2 – Assistance with data management plans and supporting data management

In the survey 82% of researchers said it would be very helpful if there were someone at the University available to help with data management plans. To address this, we have:

  • Added tailored information about data management plans to our information sessions.
  • Linked the DMPonline tool from our data website. This allows researchers to prepare funder specific data management plans
  • Organised data management plan clinic sessions (one to one appointments on demand)
  • Prepared guidelines on how to fill in a data management plan.

Additionally, 63% researchers indicated that it would be ‘very useful’, and further 31% indicated that it would be ‘useful’ to have workshops on research data management. We have therefore prepared a 1.5 hour interactive introductory workshop to research data management, which is now offered across various departments across the University. We are also developing the skill sets within the library staff across the institution to deliver research data management training to researchers from their field.

Solution 3 – Providing an institutional repository

Finally, 79% of researchers indicated that it would make data sharing easier if the University maintained its own, easy to use data repository. We therefore had to do something about our repository, which had not been updated for a long time. We have rolled-out series of updates to the repository, taking it to Version 4.3, which will allow minting DOIs to datasets.

Meantime we also had to think of a strategy to make data sharing as easy as possible. The existing processes for uploading research data to the repository were very complicated and discouraging to researchers. We did not have any web-mediated facility that would allow researchers to easily get their data to us. In fact, most of the time we asked researchers to bring their data to us on external hard drives. This was not an acceptable solution in the 21st century!

Researchers like simple processes, Dropbox-like solutions, where one can easily drag and drop files. We have therefore created a simple webform, which asks researchers for the minimal necessary metadata information, and allows them to simply drag and drop their data files.

The outcomes

It turned in the end it was really worth the effort of understanding researchers’ needs before considering solutions. As of 24 August 2015, the Research Data Management website has been visited 10,992 times. Our training sessions on research data management and data planning have received extremely good feedback – 73% of respondents indicated that our workshops should be ‘essential’ to all PhD students.

And most importantly, since we launched our easy-to-upload website form for research data, we have received 122 research data submissions – in four months we have received more than 1.5 times more research outputs than in ten years of our repository’s lifetime.

So our advice to anyone wishing to really support researchers is to truly listen to their needs, and address their problems. If you create useful services, there is no need to worry about the uptake.

data-plasma4This infographic demonstrates how successful the Research Data Facility has been. Prepared by Laura Waldoch from the University Library, it is available for download.

To know more about our activities, follow us on Twitter.

 

Published 24 August 2015
Written by Dr Marta Teperek and Dr Danny Kingsley
Creative Commons License

 

Dutch boycott of Elsevier – a game changer?

A long running dispute between Dutch universities and Elsevier has taken an interesting turn. Yesterday Koen Becking, chairman of the Executive Board of Tilburg University who has been negotiating with scientific publishers about an open access policy on behalf of Dutch universities with his colleague Gerard Meijer, announced a plan to start boycotting Elsevier.

As a first step in boycotting the publisher, the Association of Universities in the Netherlands (VSNU) has asked all scientists that are editor in chief of a journal published by Elsevier to give up their post. If this way of putting pressure on the publishers does not work, the next step would be to ask reviewers to stop working for Elsevier. After that, scientists could be asked to stop publishing in Elsevier journals.

The Netherlands has a clear position on Open Access. Sander Dekker, the State Secretary  of Education has taken a strong position on Open Access, stating at the opening of the 2014 academic year in Leiden that ‘Science is not a goal in itself. Just as art is only art once it is seen, knowledge only becomes knowledge once it is shared.’

Dekker has set two Open Access targets: 40% of scientific publications should be made available through Open Access by 2016, and 100% by 2024. The preferred route is through gold Open Access – where the work is ‘born Open Access’. This means there is no cost for readers – and no subscriptions.

However Gerard Meijer, who handles the negotiations with Elsevier, says that the parties have not been able to come close to an agreement.

Why is this boycott different?

It is true that boycotts have had different levels of success. In 2001, the Public Library of Science started as a non-profit organization of scientists ‘committed to making the world’s scientific and medical literature freely accessible to both scientists and to the public’. In 2001 PLoS (as it was then) published an open letter asking signatories to pledge to boycott toll-access publishers unless they become open-access publishers. The links to that original pledge are no longer available. Over 30,000 people signed , but did not act on their pledge. In response, PLOS became an open access publisher themselves, launching PLOS Biology in October 2003.

In 2012 a Cambridge academic Tim Gowers started the Cost of Knowledge boycott of Elsevier which now has over 15,000 signatures of researchers agreeing not to write for, review for, or edit for Elsevier. In 2014 Gowers used a series of Freedom of Information requests to find out how much Elsevier is charging different universities for licence subscriptions. Usually this information is a tightly held secret, as individual universities pay considerably different amounts for access to the same material.

The 2015 Dutch boycott is significant. Typically negotiations with publishers occur at an institutional level and with representatives from the university libraries. This makes sense as libraries have long standing relationships with publishers and understand the minutiae of the licencing processes . However the Dutch negotiations have been led by the Vice Chancellors of the universities.  It is a country-wide negotiation at the highest level. And Vice Chancellors have the ability to request behaviour change of their research communities.

This boycott has the potential to be a significant game changer in the relationship between the research community and the world’s largest academic publisher. The remainder of this blog looks at some of the facts and figures relating to expenditure on Open Access in the UK. It underlines the importance of the Dutch position.

UK Open Access policies mean MORE publisher profit

There have also been difficulties in the UK in relation to negotiations over payment for Open Access. Elsevier has consistently resisted efforts by Jisc to negotiate an offsetting deal  – where a publisher provides some sort of concession for the fact that universities in the UK are paying unprecedented amounts in Article Processing Charges on top of their subscriptions because of the RCUK open access policy.

Elsevier is the world’s largest academic publisher. According to their Annual Report the 2014 STM revenue was £2,048 million, with an operating profit of £762 million. This is a profit margin of 37%. That means if we pay an Article Processing Charge of $3000 then $1,170 of that (taxpayers’) money goes directly to the shareholders of Elsevier.

The numbers involved in this space are staggering. The Wellcome Trust stated in their report on 3 March 2015 The Reckoning: An Analysis of Wellcome Trust Open Access Spend 2013 – 14: ‘The two traditional, subscription-based publishers (Elsevier and Wiley) represent some 40% of our total APC spend’.

And the RCUK has had similar results, as described in a Times Higher Education article on 16 April 2015 Publishers share £10m in APC payments: “Publishers Elsevier and Wiley have each received about £2 million in article processing charges from 55 institutions as a result of RCUK’s open access policy”.

Hybrid open access – more expensive and often not compliant

Another factor is the considerably higher cost of  Article Processing Charges for making an individual article Open Access within an otherwise subscription journal (called ‘hybrid’ publishing) compared to the Article Processing Charges for articles in fully Open Access journals.

In The Reckoning: An Analysis of Wellcome Trust Open Access Spend 2013 – 14, the conclusion was that the average Article Processing Charge levied by hybrid journals is 64% higher than the average Article Processing Charge of a fully Open Access title. The March 2015 Review of the implementation of the RCUK Policy on Open Access concluded the Article Processing Charges for hybrid Open Access were ‘significantly more expensive’ than fully OA journals, ‘despite the fact that hybrid journals still enjoyed a revenue stream through subscriptions’.

Elsevier has stated that in 2013 they published 330,000 subscription articles and 6,000 author paid articles. There is no breakdown of how many of those 6,000 were in fully open access journals and how many were hybrid. However in 2014 Elsevier had 1600 journals offering their hybrid option, and 100 journals that were fully open access (6%). Note that the RCUK open access policy came into force in April 2013. It would be interesting to compare these figures with  the 2014 ones, however I have been unable to find them.

While the higher cost for hybrid Article Processing Charges is in itself is an issue, there is a further problem. Articles in hybrid journals for which an Article Processing Charge has been paid are not always made available at all, or are available but not under the correct licence as required by the fund paying the fee. Here at Cambridge, the five most problematic publishers with whom we have paid more than 10 Article Processing Charges have a non compliance rate from 11-25%. With this group of publishers we are having to chase up between three and 31 articles per publisher. This takes considerable time and significantly adds to the cost of compliance with the RCUK and COAF policies.

According to the March 2015 Review of the implementation of the RCUK Policy on Open Access, ‘Elsevier stated that around 40% of the articles from RCUK funding that they had published gold were not under the CC-BY licence and are therefore not compliant with the policy’ (p19).

We support our Dutch colleagues

In summary, the work happening in The Netherlands to break the stranglehold Elsevier have on the research community is important. We need to stand by and support our Dutch colleagues.

NOTE: This blog was subsequently reblogged on the London School of Economics Impact Blog and later listed as one of the Top Ten Posts for 2015: Open Access. It was also listed as one of the blogs that had an average minute per page measurement of over 6 minutes and 30 seconds.

Published 3 July 2015, added to on 22 January 2016
Written by Dr Danny Kingsley
Creative Commons License