Category Archives: Uncategorized

Hybrid open access – an analysis

Welcome to Open Access Week 2016. The Office of Scholarly Communication at Cambridge is celebrating with a series of blog posts, announcements and events. In today’s blog posts we revisit the issue of paying for hybrid open access. We have also published a related post “Who is paying for hybrid?” listing funder policies on hybrid.

Recent years have seen a proliferation of funder open access mandates, the terms of which can differ markedly, adding to the confusion of an already complex area. The Registry of Open Access Repository Mandates and Policies (ROARMAP) lists 80 funders with open access requirements, and the list continues to grow.

Within the UK, policies fall into three broad categories: those that mandate green Open Access without paying a fee, such as the HEFCE policy; those that prefer gold but make no additional funds available, such as the NIHR policy, and those that have a preference for gold and offer block grants to institutions to help cover the associated costs, such as the Research Councils UK (RCUK) and Charities Open Access Fund (COAF) policies.

Accompanying this expansion of mandates, unsurprisingly, has been an increase in the amount being spent to support Open Access. The Open Access Directory lists 179 funds for OA journal articles worldwide, compared with 81 in early 2014.

All this brings into sharper relief the question of how open access funds support hybrid publishing. But first a quick history lesson.

Hybrid origins

Hybrid journals provide open access to specific articles where an Article Processing Charge has been paid in an otherwise subscription journal. A few learned societies offered hybrid options in the early 2000s. Hybrid open access options were first offered by large publishers in 2004 with Springer’s Open Choice product charging USD3000 per article. This price has not changed in the past 12 years. In the UK the Springer Compact now pays for hybrid under a different model.

Wiley Online Open’s trial began the same year, charging USD2500. Today the price ranges from USD1,500 – 5,200. Oxford Open launched in 2005, and in 2006 Elsevier Open Access and Sage Choice began. In 2007, Taylor & Francis Open Select, Cambridge Open and Nature Publishing Group’s open access offering began.

The uptake of hybrid began slowly. It is very difficult to obtain statistics on what percentage of journals have hybrid Open Access content but in his 2012 analysis The hybrid model for open access publication of scholarly articles – a failed experiment?, (open access version here ) Bo-Christer Bjork found the number of hybrid journals had doubled in the previous couple of years to over 4,300, and the number of such articles was around 12,000 in 2011. This represented a small proportion of eligible authors (1-2 %).

That analysis was published the same year as the Finch Report which recommended a gold path to Open Access. The resulting RCUK Open Access Policy and RCUK Block Grants to fund Open Access APCs has dramatically increased the  expenditure on hybrid in the UK since 2013. According to a report published in 2015, “the UK’s profile of OA take-up is significantly different from the global averages: its use of OA in hybrid journals and of delayed OA journals is more than twice the world average in both cases, while its take-up of fully OA journals with no APC (Gold-no APC) is less than half the world average and falling.”

At Cambridge University we have spent literally millions of pounds on hybrid Open Access – which constitutes approximately 85% of our total APC spend. This is a higher percentage than estimates across the country, which are a 76% spend on hybrid Open Access.

Double dipping

Hybrid represents a second income stream to publishers and has raised questions about ‘double dipping’. Some publishers manage this by reducing the cost of subscriptions in proportion to the percentage of hybrid in a given journal, such as Nature Publishing Group. However ‘big deals’ for subscriptions can render this relatively ineffective, and the reduction is spread across all subscribers, regardless of who has paid the article processing charge. This means research intensive institutions (such as Cambridge) are contributing heavily to the system but not receiving a relative reduction.

To address this issue at a local level, several publishers have created offsetting arrangements, where discounts or refunds are provided in proportion to the contribution the institution has made in APC payments above subscriptions. However, each of these schemes operates differently and they can be complicated to administer, or have other preconditions such making large prepayments to publishers.

The biggest problem from an implementation perspective, however, is that they are by no means universal. By far the biggest publisher, Elsevier, for example, offers no form of offsetting at all, although they nevertheless assert that they do not double dip. The result is that in very many cases, institutions and authors continue to have to pay twice for material in hybrid journals, swelling publisher coffers at the expense of research funding.

Very expensive

One of the problems with hybrid is that even ignoring the added cost of subscriptions to the non Open Access material in those journals, hybrid Open Access charges are more expensive than those for fully Open Access journals.

In March last year both the Wellcome Trust and the RCUK undertook a review of their Open Access policies. The Reckoning: An Analysis of Wellcome Trust Open Access Spend 2013 – 14  noted: “The average APC levied by hybrid journals is 64% higher than the average APC charged by a fully OA title”.  In Wellcome’s data, the average APC for a hybrid article in 2014-15 was £2104, compared with only £1396 for fully OA journals. Worryingly, the data showed that fully OA APC costs had risen more than their hybrid counterparts since the previous year.

Similarly in the Research Councils UK 2014 Independent Review of Implementation the observation was that article processing charges for hybrid Open Access were “significantly more expensive” than fully OA journals, “despite the fact that hybrid journals still enjoyed a revenue stream through subscriptions”.

A Max Planck Digital Library Open Access Policy White Paper published on 28 April 2015 noted that The Wellcome Trust had a significantly higher average APC cost than German, Austrian and SCOAP3 figures. This was because the Wellcome Trust pays for hybrid APCs, “which are not only much higher than most pure open access costs but are also widely considered not to reflect a true market value. In Germany and many other countries, hybrid APCs are excluded from the central funding schemes.”

A study undertaken last year considered APCs in the five-year period between 2010 and 2014 found the mean for fully-OA journals published by non-subscription publishers was£1,136 compared with £1,849 for hybrid journals. The same study also found that traditional subscription publishers are capturing most of the APC market. The top-10 publishers in terms of numbers of APCs received from participant institutions (who received 76% of the total APCs paid from the sample) “only included two fully-OA publishers (PLOS and BMC). The others were established publishers (Elsevier, Wiley, Springer and so on) who are mostly gaining APC income from hybrid journals.”

The 2014 report Developing an effective market for open access article processing charges was written for a consortium of research funders comprising Jisc, Research Libraries UK, Research Councils UK, the Wellcome Trust, the Austrian Science Fund, the Luxembourg National Research Fund and the Max Planck Institute for Gravitational Physics. The authors noted of the hybrid journal market that it is “highly dysfunctional, with very low uptake for most hybrid journals and a relatively uniform price in most cases without regard to factors such as discipline or impact“.

Value for money?

A second issue which has become apparent as open access mandates have expanded is the extent to which publishers – mostly of hybrid journals – do not deliver the Open Access option that has been paid for. In many cases, the ‘immediate’ Open Access for which an author or institution has paid an APC may take months or even years to be made Open Access; some articles are never made Open Access at all. Even when articles are made available, there is no guarantee that it will have the appropriate licence. It is by no means uncommon for articles to carry more restrictive licences than those requested, or for the appropriate licence to appear on a journal website while the PDF of the article itself bears only a publisher copyright notice and a prominent ‘All rights reserved’.

In March 2016 the Wellcome Trust published a report into compliance among its paid-for articles in 2014-15, concluding:

The good news is that we have seen an improvement in correct and programmatically identifiable licences (from 61% of papers in ’13-‘14, to 70% in ’14-‘15) and a similar increase in overall compliance from 61% to 70%.  The bad news, however, is that in 30% of cases we are not getting what we are paying for.

The source of this non-compliance was overwhelmingly hybrid journals, and the largest publishers were the worst offenders: in the Wellcome data, 31% of Elsevier hybrid articles (and 26% of their ‘fully OA’ articles!) were non-compliant, as were 54% of Wiley’s.

One might conclude, then, that hybrid Open Access represents a bad deal for funders and institutions, with poor service and double-dipping.

Other hybrid issues

To further complicate matters, some have argued that the open access/hybrid dichotomy is too stark. Some journals, particularly coming from learned societies, (e.g. Plant Physiology, from the American Society of Plant Biologists) make all articles open access after a certain period, but charge an optional APC to make them available sooner. This would generally be considered hybrid publishing, but could be seen as a rather different category from the majority of corporate hybrid journals, in which articles never become Open Access unless an APC is paid. There is a possibility that strict funder mandates against hybrid could close off such journals to researchers, exacerbating the anxieties regarding open access felt by many learned societies.

Where does this leave authors and institutions? It’s clear that the situation remains very much in flux. The problems that have existed with hybrid since the beginnings of Open Access are far from resolved, despite the expansion of journal offsetting schemes. Meanwhile, prices continue to rise and while many funders have taken the step of allowing their funds to be used only for fully Open Access journals, it is still a minority of the largest and most powerful funding bodies.

The result is confusion for researchers and an increased administrative burden for institutions, who have to manage and advise on a proliferation of divergent funder and publisher policies, as well as conducting regular and extremely resource-intensive compliance-checking of hybrid publications to ensure publishers have delivered what has been paid for. As numbers of Open Access publications increase, it is questionable how sustainable this will be.

Published 24 October 2016
Written by Dr Philip Boyes and Dr Danny Kingsley 
Creative Commons License

Request a copy: process and implementation

This blog post looks at a recent feature implemented in our repository called ‘Request a copy’ and discusses the process and management of the service. There is a related blog post which discusses the uptake and reaction to the facility.

As part of our recent upgrade to the University’s institutional repository (now renamed ‘Apollo‘), we implemented a new feature called ‘Request a copy’. ‘Request a copy’ operates on the principle of peer-to-peer sharing – if an item in Apollo is not yet available to the public, a repository user can ask the author for a copy of the item. Authors sharing copies of their work on an individual basis falls outside the publisher’s copyright restrictions; here, the repository is acting as a facilitator to a process which happens anyway – peer to peer sharing.

The main advantage of the ‘Request a copy’ feature is to open up the University’s most current research to a wider audience. Many of our users do not necessarily come from an academic background, or may be based within another discipline, or an institution where journal subscriptions are more limited. The repository is often their first port of call to find new research as it ranks highly in Google search results. We hope that these users will benefit from ‘Request a copy’ by being able to access new outputs early, at researchers’ discretion. Additionally, this may provide an added benefit to researchers by introducing new contacts and potential collaborations.

How it works

Screen Shot 2016-10-06 at 13.53.30Items in Apollo that are not yet accessible to the wider public are indicated by a padlock symbol that appears on the thumbnail image and filename link which users can usually click to download the file.

Reasons why the file may not yet be publicly available include:

  • Some publishers require that articles in repositories cannot be made available until they are published, or until a specified time after publication
  • We hold a number of digitised theses in the repository, and for some we have been unable to contact the author to secure permission to make their thesis available
  • Authors may choose to make their dataset available only once the related article is published

When a user clicks on a thumbnail or filename link containing a padlock, they are directed to the ‘Request a copy’ form. Here, they provide their name, email address and a message to the author. On clicking ‘Request copy’, an email is sent to the person who submitted the article, containing the user’s details. The recipient of this email then has the option to approve or deny the user’s request, to contact the user for more information, or (if they are not the author) to forward the request to the author.

How it really works

In practice, the process is slightly more complicated. For most of the content in the repository, the person who submitted an item will be a member of repository staff, rather than the item’s author. This means that for the most part, emails generated by the ‘Request a copy’ form were initially sent to members of the Office of Scholarly Communication team. In some cases, these requests were sent to people who have left the University, and we have had to query the system to retrieve these emails. As an interim measure, we have now directed all emails to support@repository.cam.ac.uk. These still need manual processing.

Theses

For theses where we have not received permission from the author to make them available, we forward requests to the University Library’s Digital Content Unit, who have traditionally provided digitised copies of theses at a charge of £65. We have  found however, that once information about this charge is communicated to the requester, very few (approximately 1%) actually complete the process of ordering a thesis copy.

We have been working with the Digital Content Unit on a trial where thesis copies were offered at £30, then £15. However, even at these cheaper prices, uptake remained low (it increased to 10%, but due to the small size of the sample, this only equated to two and three requests at each price point, and therefore may not be statistically significant). This indicates that the objection was to being charged at all, rather than to the particular amount. Work in this area remains ongoing to try and offer thesis copies as cheaply as possible to requesters, while allowing the Digital Content Unit to cover their costs.

Articles

If the request is for an article, we first need to check whether the article has actually been published and is already available Open Access. Although we endeavour to keep all our repository records up to date, unless we are informed that an article has been published, repository staff need to check each article for which publication is pending. This is a time-consuming manual process, and when we have a large backlog, sometimes it can take a while before an article is updated following publication.

If we found that the article has indeed been published and can be made Open Access, we amend the record, make the article available and email the requester to let them know they can now download the file directly from the repository.

On the other hand, if the article is still not published, or if it is under an embargo, we need to forward the request to the corresponding author(s). Sometimes their name(s) and email address(es) will be included within the article itself, and sometimes we have a record of who submitted the article via the Open Access upload form. However, if it is not clear from the article who the corresponding author is, or if their contact details are not included, and if the article was submitted by an administrator rather than one of the authors, we then need to search via the University’s Lookup service for the email addresses of any Cambridge authors, and search the internet for email addresses of any non-Cambridge authors, before we can forward on the request.

As a result, it can take repository staff up to 30 minutes to process an individual request. This is quicker if the article has been requested previously and the author’s contact details are already stored, but can take longer when we need to search. Sometimes, there is also repeat correspondence if the author has any queries, which adds to the total time in processing each request.

Amending our processes

Since introducing ‘Request a copy’, we have started collecting the email addresses of corresponding authors when an article is submitted, and we have commissioned a repository development company to ensure that ‘Request a copy’ emails can be sent directly to those authors for whom we have an email address – a feature that we are hoping to implement in the next few weeks.

However, if the author moves institution, their university email address will no longer be valid, and any requests for their work will again need to come via repository staff. One way to solve this would be to ask for an external (non-university) email address for the corresponding author at the point where they upload the article to the repository. However, this would introduce an extra step to an already onerous process and may act as a further barrier to authors submitting articles in the first place.

Generally, ‘Request a copy’ is a great idea and provides many benefits to the research community and beyond. But the implementation of this service has been challenging. The amount of time taken by each request has meant that some staff members have been redeployed from their usual jobs to facilitate these requests, which also has an impact on the backlog of articles in the repository that need to be checked in case they have since been published. If an article is published but still in the backlog (and therefore not publicly available in the repository), unnecessary requests for it could result in a reputational issue for the Office of Scholarly Communication and the University.

We will continue to look at our processes over the coming academic year, to see how we can improve our current workflows, and identify and resolve any issues, as well as determining where best to focus any further development work. In the related blog post on ‘Request a copy’, I’ll be talking about usage statistics for the service so far, some more unexpected use cases we have encountered, and feedback from our users that will help us to shape the service into the future.

Published 7 October 2016
Written by Sarah Middle
Creative Commons License

Request a copy: uptake and user experience

This post looks at the University of Cambridge repository  ‘Request a copy’ service from the user’s perspective in terms of uptake so far, feedback we have received, and reasons why people might request a copy of a document in our repository. You may be interested in the related blog post on our ‘Request a copy’ service, which discusses the concept behind ‘Request a copy’, the process by which files are requested, and how this has been implemented at Cambridge

Usage Statistics

The Request a Copy button has been much more successful than we anticipated, particularly because there is no actual ‘button’. By the end of September 2016 (four months after the introduction of ‘Request a copy’), we had received 1120 requests (approximately 280 requests per month), the vast majority of which were for articles (68%) and theses (28%). The remaining 4% of requests were for datasets or other types of resource. We are aware that this is a particularly quiet time in the UK academic year, and expect that the number of requests will increase now term has started again.

Of the requests for articles during this period, 38% were fulfilled by the author sending a copy via the repository, and 4% were rejected by clicking the ‘Don’t send a copy’ button. However, these figures could be misleading as a number of authors have also advised us that they have entered into correspondence with the requester to ask them for further information about who they are and why they are interested in this research. Eventually, this correspondence may result in the author emailing a copy of the paper to the requester, but as this happens outside the repository, it does not appear in our fulfilment statistics. Therefore, we suspect the figure for accepted requests is in actual fact slightly higher.

Of the articles requested during this period, 45% were yet to be published, and 55% were published but not yet available to those without a subscription to the journal. The large number of requests made prior to publication indicates the value of having a policy where articles are submitted to the repository on acceptance rather than publication – there is clearly interest in accessing this research among the wider public, and if they are able to make use of it rather than waiting during the sometimes lengthy period between acceptance and publication, this can make the research process more efficient.

Author Survey

To find out why authors might not be fulfilling requests through the repository links, Dr Lauren Cadwallader, one of our Open Access Research Advisors, sent a survey on 6 July 2016 to the 113 authors who had received requests but had not clicked on the repository link or been in touch with repository staff to advise of an alternative course of action. This survey had a 13% response rate, with 15 participants, as well as eight email responses from users who provided feedback but did not complete the survey.

The relatively low response rate is indicative of either a lack of engagement with or awareness of the process – it is possible that the request emails and survey email were dismissed as spam, or that researchers were unable to respond due to an already heavy workload. One way of addressing this could be to include some information about ‘Request a copy’ in our existing training sessions, in particular to emphasise how quick the process can be in cases where the author is happy to approve the request without needing any further information from the user. We have also been developing the wording of the email sent to the author, to explain the purpose of the service more clearly, and to make it sound like a legitimate message that is less likely to be dismissed as spam.

Of the 15 people who participated in the survey, the majority were aware that they had received an email, which shows that lack of response is not always due to emails being lost in spam filters. When asked for the reason why they did not fulfil the request via the repository link, 35% of authors replied that they had emailed the requester directly, either to send the file, to request more information, or to explain why it was not possible for them to share the file at this time. This finding is quite positive, as it indicates that over a third of these requests are indeed being followed up. Although it would be helpful to us to be able to keep track of approvals through the system, at least this means that the service is fulfilling its purpose in providing a way for authors to interact with other interested researchers, and to share their work if appropriate. In fact, one of the aspects that participants liked best about the ‘Request a copy’ service was the ability to communicate directly with the requestor.

Two authors did not respond to the request because the article was available elsewhere on the internet, such as their personal / departmental website, or a preprint server (where the restrictions relating to repositories do not apply), although they did not communicate this to the requestor. In these cases, it is definitely positive that the authors are happy to share their work; however, it does show that there is often an assumption among researchers that people interested in reading their articles will be restricted to those already in their specific disciplinary communities.

Requests from people who are unaware of sites where the research might also be made available demonstrates that there is indeed an appetite among those outside of academia, or from different subject areas. This is generally a really positive thing, as it facilitates the University’s research outputs to educate and inspire a new audience beyond the more traditional communities, and could potentially lead to new collaboration opportunities. To ensure that requestors are able to access the material, and that researchers are not bombarded with requests for documents that are already freely available, authors can provide links to any external websites that are hosting a preprint version of the article, and we will add them to the repository record.

Other responses indicated that we were not necessarily emailing the right person, as participants said that they had not approved the request because they were not the corresponding author, or because they thought a co-author had already responded. At the outset of the service, we felt that emailing as many authors as possible would increase the likelihood of receiving a response; however, the survey results show that it would be better to send requests to the corresponding author(s) only, at least in cases where it is clear who they are.

An issue we have encountered on a semi-regular basis since HEFCE’s Open Access policy came into force is that of making an article’s metadata available prior to its publication. Although HEFCE and funder policies state that an article’s repository record should be discoverable, even if the article itself must be placed under embargo based on publisher restrictions, there is concern among some authors that metadata release breaches the publisher’s press embargo. You can read about this issue in some detail here.

Receiving requests for an article via the ‘Request a copy’ service can be unsettling for authors as it demonstrates how easily the repository record can be accessed, and rather than respond to the request, they contact the Open Access team to ask for the metadata record to be withdrawn until the article is published. This demonstrates a need to communicate more clearly, both on our website and within the ‘Request a copy’ pages in the repository, what is required of authors as part of HEFCE and funder Open Access policies. We will also be more explicit in the ‘Request a copy’ emails sent to authors in stating that sharing their articles via this service will not be seen as a breach of the publisher’s embargo. In cases where the author does not wish to disseminate their article before it is published, they have the option to deny any requests they receive.

Facilitating requests

There have been several instances where press interest around an article at the point of publication has generated a large number of requests, each of which must be responded to individually by the author. This has resulted in several authors asking that we automatically approve every request rather than forwarding them on. Unfortunately this is not possible for us to do, due to the legal issues surrounding ‘Request a copy’.

It is perfectly acceptable for an author to send a copy of their article to an individual, but if a repository makes that article available to everyone who requests it before the embargo has been lifted, this would be a breach of copyright because it would be ‘systematic distribution’. While responding to multiple requests is likely to be seen as an annoyance by an already overstretched researcher, we hope that a large volume of requests will also be viewed in a positive light, as it demonstrates the interest people have in their work.

Use cases

An interesting example of a request we received was actually from one of the authors of the article, as they did not have access to a copy themselves. This raises some questions about communication between the researchers in this case, if the ‘Request a copy’ service was seen to be a better way of gaining access to the author’s own research, rather than contacting one of their co-authors.

A more surprising use case is that of a plaintiff who had lost a legal case. The plaintiff was requesting an as-yet unpublished article that had been written about the case, because the article appears to argue in favour of the plaintiff and could potentially inform a future appeal. This is a good example of how the ‘Request a copy’ service could be of direct benefit in the world outside academia.

Although the vast majority of requests have been for research outputs such as articles, theses and datasets, we also occasionally receive requests for images that belong to collections held in different parts of the University, where high-quality versions are stored in the repository under restricted access conditions. With these requests, it can be more difficult to find who the copyright-holder is, which sometimes requires detective work by the repository team. In one case, permission had to be sought from a photographer who only has a postal address, and therefore required more explanation about the repository more generally, as well as the specific request.

Looking to the future

We will use this research and any further feedback we receive to improve the experience of our ‘Request a copy’ service for both authors and requestors, including implementing the ideas suggested above. Usage statistics will continue to be monitored, and we may run a user survey again to determine how far the service has improved, as well as to identify any new issues.

In the meantime, if you have any comments or questions about our ‘Request a copy’ service, either as an author or a requester (or both), please send us an email to support@repository.cam.ac.uk .

Published 7 October 2016
Written by Sarah Middle
Creative Commons License

Milestone – 10,000th article processed by OA Service

The Open Access Service at Cambridge has received its 10,000th Open Access submission – highlighting its commitment to making research freely available to anybody who wants to access it, without publisher paywalls or expensive journal subscriptions.

Through open access our research can reach a worldwide audience.

Nita Forouhi

The 10,000th submission, reporting on the impact of eating a Mediterranean diet on the risk of developing cardiovascular disease in a UK population, was deposited by Signe Wulund at the MRC Epidemiology Unit, on behalf of Dr Nita Forouhi, Programme Leader in Nutritional Epidemiology at the MRC Epidemiology Unit, and several co-authors.

The Open Access movement has been growing in strength in academia for many years, and it is increasingly being mandated by funding bodies and government.

Dr Forouhi said: “Through open access our research can reach a worldwide audience. It would be a huge pity if interested researchers, practitioners or policy makers could not read about new research, such as our latest findings on the link between the Mediterranean diet and cardiovascular health in a non-Mediterranean setting, because of something as simple as lacking a journal subscription.

“Open access enables wider dissemination of research findings, and in turn, facilitates better research and evidence-based policy and clinical practice.”

The Cambridge Open Access Service was established within the University Library in 2013 in response to Research Councils UK (RCUK) making Open Access mandatory for anyone accepting their funding. Many other major funders, including the Wellcome Trust, Cancer Research UK and the British Heart Foundation, have similar policies.

In 2014, the Higher Education Funding Council for England announced that Open Access would be compulsory for any article included in the next Research Excellence Framework (REF) exercise. This policy came into force on April 1, 2016, effectively meaning that all research in UK institutions now has to be made freely available.

Since its inception in 2013, the Open Access service has processed 10,000 manuscripts, across all University faculties and departments and worked with 3,000 different members of staff. 6,000 of the papers were covered by the HEFCE open access policy; 4,000 acknowledged RCUK funding and 1,900 COAF (many papers fall into multiple categories, and some into none). More than £5.4 million of Open Access grants from funding bodies have also been distributed.

Meeting these requirements is a major task for the University, and one it has tried to make as simple as possible for researchers. Authors are simply required to upload their manuscript to www.openaccess.cam.ac.uk when it’s accepted for publication, and the Open Access team advise them on what they need to do to comply with funder requirements, eligibility for any funding body grants, and handle depositing the article into Apollo, the University’s institutional repository.

Ten thousand manuscripts have now been received in this way, and the vast majority of them have been able to be made Open Access, free for anyone who wants to read and benefit from them.

The 10,000th article was: ‘Prospective association of the Mediterranean diet with cardiovascular disease incidence and mortality and its population impact in a non-Mediterranean population: the EPIC-Norfolk Study’ in BMC Medicine. [DOI:10.1186/s12916-016-0677-4]

The Open Access team at the University of Cambridge is part of the Office of Scholarly Communication (OSC), within the University Library. As well as assisting researchers with Open Access and Open Data compliance, it advises on scholarly communication tools, techniques, policies and practices, and provides training.

This story originally appeared on the University of Cambridge Research news pages.

Published 05 October 2016
Written by Dr Philip Boyes
Creative Commons License

Taking a Principled stance – the Scholarly Commons

It only rains about 10 days a year in San Diego. And Tuesday was one of them. In a rooftop room on campus in San Diego at UCSD, a group had gathered for the FORCE11 Scholarly Commons workshop. The workshop brought together members of the Scholarly Commons working group, who hail from around the world and come from the broad scholarly commons. The Scholarly Commons is an idea to help define the future of research communication. The goal is to promote the best research and scholarship possible through rapid and wide dissemination to all who need or want it.

FORCE stands for the Future of Research Communications and eScholarship and is an organisation (or community) open to anyone interested in these issues. The group consisted of researchers from multiple disciplines, communicators, programmers, and a couple of librarians. This is the unusual and powerful thing about FORCE11 – the diversity of its members. Someone actually remarked: ‘you know, there probably should be a few more librarians here’ which is something you don’t often hear at meetings about open access issues. Usually librarians are delighted if a real live researcher turns up.

We were meeting to discuss the draft of 18 Principles of the Commons – an attempt to define what the community considers the attributes and behaviours of a person who is fully participating in research. The Principles are broadly separated into four major themes of being Open, Equitable, Sustainable and Research & Culture Driven.

FORCE11 works openly and tries to be as accessible as possible so there were full and open notes being collaboratively taken and the Twitter hashtag was #futurecommons.

The workshop was very hands-on, and expertly moderated by Jeroen Bosman and Bianca Kramer who are the power behind the excellent 101 Innovations in Scholarly Communication project. As their ‘wheel’ of tools identifying tools available across the research life stages and through time demonstrates, it is becoming increasingly difficult to navigate the new research space. Indeed that is part of the rationale behind this Scholarly Commons project. It is an attempt to take stock and make sense of what we, the community, want to see in an open and accessible future.

Despite having fewer than 40 people we managed to have multiple activities running concurrently with several ‘unworkshops’. Everything was fed back into the group, and there was a very broad range of discussions, agreements and ideas. To prevent this blog being a tome, I am only going to cover here a couple of areas that were discussed.

Standing on the shoulders of giants

Due to a flight delay I was only able to catch the end of the Sunday evening welcome reception where we were asked to reflect on the 18 draft Principles plastered on the walls and decide which we agreed with and which we did not (or had an issue with). As I scanned through I was struck by the overall similarity they had with Robert Merton’s 1942 publication, The Normative Structure of Science where he proposed that science operated to four ‘norms’, of Universalism, Communalism, Disinterestedness and Organised Skepticism.

Screen Shot 2016-09-22 at 11.53.53I mentioned this in an early discussion and was somewhat chuffed that the group really did take this on board – to the extent that one unworkshop group worked on updating the norms to reflect today’s situation.

As an aside – the challenge with having people coming together from multiple research areas is everyone brings their a priori biases with them. People tend to see the problem through their own lens, so have different ways of approaching the problem. For the group to agree that this perspective was a good one was personally very validating.

Considering outreach as part of the research lifecycle

I the first unworkshop I joined we discussed how research-centred the Principles are – they did not consider the importance of outreach. Given the impenetrable nature of the language in many academic papers, we agreed that making something Open Access facilitates outreach but is not outreach itself.

The discussion moved to idea about what researchers could do to help with outreach – even if they themselves did not want (or were unable) to do it. These are fairly simple including providing supplementary material that is accessible in terms of the descriptive language used (no jargon), potentially providing the information in a different language to English, and ensuring the license under which it is made available is open.

We proposed that the Commons should facilitate outreach and have the outreach in mind even if the researcher themselves does not generate the outreach. There had been a comment earlier in the Equity discussion that noted “Each part of the research cycle is equally valid and none should not be preferenced over the other.” Our discussion concluded that outreach (for the lay public) should be considered to be part of the research process and equally valued.

It should be noted that we are not discussing paper-related activities here. Making the paper open access or tweeting a link to the paper doesn’t count. This is about sharing the information in an understandable manner outside the Academy.

Tool mapping

thumb_IMG_2188_1024The workshop, as mentioned, was very hands-on. By that I mean we did several ‘craft activities’ involving dots, glue, sticky tape and scissors. One of these activities involved ranking various tools for research against the four themes of the Principles, deciding whether they were in alignment with them (green), in opposition with them (red) or in-between (yellow).

Screen Shot 2016-09-22 at 13.12.47We then placed these assessments on the windows under the part of the research lifecycle they related to, and ordered them. The most Principle-friendly tools were up high, and the least down low.

 

thumb_IMG_2191_1024 We then did an activity where we tried to trace the path of our own discipline in terms of the tools our disciplines tended to use. This exercise was an attempt to see if there were any discernable patterns about where some disciplines tend to align or otherwise with the Principles. While the sample size for each discipline was too small to really come to any conclusions, this exercise did open up ideas for a way of disseminating the Principles.

The Principles as an Innovation

This is where another of my disciplinary perspectives comes into play. If we accept that the Principles are themselves an ‘innovation’ – in that they are “an idea, practice, or object that is perceived as new by an individual or other unit of adoption”, then we can look to Everett Rogers Diffusion of Innovations first published in 1962, and now in its 5th edition. You might not have heard of him but you know about his work – Rogers was the person who coined the idea of ‘early adopters’, late adopters’ and ‘laggards’.

Amongst lots of interesting insights about why people adopt new ideas, Rogers came up with five ways to evaluate an innovation which will determine the success or otherwise of its adoption. These are judged as a whole and are interrelated:

  • Relative advantage – the perceived efficiencies gained by the innovation relative to current tools or procedures
  • Compatibility with the pre-existing system
  • Complexity or difficulty to learn (it needs to be easy)
  • Trialability or testability without risking the current system
  • Observed effects.

It is the second point which is the interesting one here – ‘Compatibility with the pre-existing system’. The reason why this is relevant is we are not talking about one system when we discuss scholarship – there are a myriad of systems. There is no ‘one solution’. If we are to try and implement something like the Principles across the academy, we will need to do it along disciplinary lines. (Disclosure – this happens to be the conclusion of my 2008 PhD thesis on the adoption of open access across disciplines).

Disciplinary dissemination

This leads us to the question of audiences for the Principles. Ideally we would have institutions signing up to them, pledging that they will work with their research community to work in this manner. But this is unrealistic currently due to the diverse nature of research institutions. But there might be a way to have funders sign up, because often funding is given within disciplinary restraints. This is doubly the case because funders (in the UK, Australia and the US at least) are increasingly using an ‘Impact narrative’ and the Principles offer a way to practically identify and reward impact behaviour.

And we are not coming from a standing start. We can build on the work done by Jeroen Bosman and Bianca Kramer in their 101 Innovations in Scholarly Communication project. There were over 20,000 responses to their survey of innovation use and this allows a detailed mapping of disciplinary behaviours. If we the further map those findings against an assessment of the research tools being used at a disciplinary level and whether they are aligned with the Principles, we should be able to see which disciplinary areas are already working in the Principled way. It is the funders of these disciplines that we should approach first to try and gain early adoption of the Principles. This  work would become a checklist that can reward people for the behaviours that they are already doing in this space.

A project like this would in turn open up some questions about what we need to do at a disciplinary level to help that community become more aligned with the Principles. These may require a number of approaches – Do they have the tools that work for them or do these need to be developed? Is there a cultural reason why this discipline is not engaging? In answering these questions we come up with the answer to the question: What does a Scholarly Commons researcher look like in this discipline?  Until we have some evidence of where these areas are we are effectively stabbing in the dark.

Making this happen now

In a different unworkshop we talked about how the nature of the Principles themselves went against the idea of being  inclusive because we are potentially creating a binary situation – either you are following the Principles or not.  What we really need to do, we agreed, is not reject people for acting in ways that are not totally in line with the Principles, rather reward behaviour that supports the Principles.

In order to facilitate this, we designed a series of ‘Decision Trees’ to help researchers be as open as they can. This is a recognition that researchers are working within a complex ecosystem. With all the will in the world, if there is not an Open Access journal available to you in your field, you cannot publish in one.

thumb_IMG_2196_1024The easiest part of the research lifecycle was to tackle was publishing, in terms of choosing a publication outlet. The decision tree allows for people who cannot publish in an Open Access journal, nor afford to pay for hybrid (not something I personally recommend anyway) to still be ‘Principled’ by putting a copy of their work in a repository.

thumb_IMG_2197_1024Our discussion about data was more complex. For a start, there is a question about whether the data is digital or not. As we discussed it, our draft tree became incredibly complex so we created two separate flows. The Data 1 decision tree says to someone who has analogue data and no funds to digitise, that as long as they put in some information in their paper about how to contact them for the supporting data, then they have met the spirit of the Principles to the best of their ability.

thumb_IMG_2198_1024While we know the gold standard for data sharing is to have the data (with well defined metadata) available openly in a non proprietary repository with a DOI, for various reasons this is not always possible. We should not sanction a researcher because they are unable to meet that (very high) standard. The Data 2 tree shows that data that is in a repository under embargo without a DOI is discoverable in a way it would not be if it were in a desk drawer – so that is, again, within the spirit of the Principles. We need to consider the ‘close enough’ option as being a valid one, at least in the implementation stage of the Principles.

We agreed that in some areas of the research lifecycle that a list of tools that could help would be of more use than a decision tree. Time restraints meant there are a couple of areas of the lifecycle which still need consideration (and we need to do some decision tree design work!), but generally the group agreed that this was probably quite useful.

Conclusions

When it comes to the Principles themselves, we are still working on it. We did however agree that we thought the Principles were something worth doing, and that they were more or less something we can start working with (and on – they are likely to be dynamic). One suggestion was that we call them Scholarly Commons Principles 1.0 – a reference to this being the first version of possibly many. There are plans for several subgroups to pitch for funding to do some deeper work in some areas. So it is an ongoing project, but a substantial one.

There are some troopers in the Scholarly Communication community. Several people at our workshop had ‘done the double’ – attending  the SciDataCon 2016 conference and associated meetings over eight days in Denver last week and then coming to this event. The gruelling pace was starting to show by the end of the last day of our workshop.

thumb_IMG_2177_1024You know you have been on a very short visit when you fly back with the same in-flight crew as your outward bound journey. One of them even recognised me and commented on how quickly I was returning. So while the trip was an exhausting few days, it was productive and worthwhile. And it was really nice to smell eucalypt trees (rather bizarrely) and do laps in an outdoor pool – things I have not done since moving to the UK.

Published 22 September 2016
Written by Dr Danny Kingsley
Creative Commons License

Cambridge University spend on Open Access 2009-2016

Today is the deadline for those universities in receipt of an RCUK grant to submit their reports on the spend. We have just submitted the Cambridge University 2015-2016 report to the RCUK and have also made it available as a dataset in our repository.

Compliance

Cambridge had an estimated overall compliance rate of 76% with 46% of all RCUK funded papers  available through the gold route and 30% of all RCUK funded papers available through the green route.

The RCUK Open Access Policy indicates that at the end of the fifth transition year of the policy (March 2018) they expect 75% of Open Access papers from the research they fund will be delivered through immediate, unrestricted, on‐line access with maximum opportunities for re‐use (‘gold’). Because Cambridge takes the position that if there is a green option that is compliant we do not pay for gold, our gold compliance number is below this, although our overall compliance level is higher, at 76%.

Compliance caveats

The total number of publications arising from research council funding was estimated by searching Web of Science for papers published by the University of Cambridge in 2015, and then filtered by funding acknowledgements made to the research councils. The number of papers (articles, reviews and proceedings papers) returned in 2015 was 2080. This is almost certainly an underestimate of the total number of publications produced by the University of Cambridge with research council funding. The analysis was performed on 15/09/2016.

Expenditure

The APC spend we have reported is only counting papers submitted to the University of Cambridge Open Access Team between 1 August 2015 and 31 July 2016. The ‘OA grant spent’ numbers provided are the actual spend out of the finance system. The delay between submission of an article, the commitment of the funds and the subsequent publication and payment of the invoice means that we have paid for invoices during the reporting period that were submitted outside the reporting period. This meant reconciliation of the amounts was impossible. This funding discrepancy was given in ‘Non-staff costs’, and represents unallocated APC payments not described in the report (i.e. they were received before or after the reporting period but incurred on the current 2015-16 OA grant).

The breakdown of costs indicates we have spent 4.6% of the year’s allocation on staff costs and 5.1% on systems support. We noted in the report that the staff time paid for out of this allocation also supports the processing of Wellcome Trust APCs for which no support is provided by Wellcome Trust.

Headline numbers

  • In total Cambridge spent £1,288,090 of RCUK funds on APCs
  • 1786 articles identified as being RCUK funded were submitted to the Open Access Service, of which 890 required payment for RCUK*
  • 785 articles have been invoiced and paid
  • The average article cost was ~£2008

Caveats

The average article cost can be established by adding the RCUK fund expenditure to the COAF fund expenditure on co-funded articles (£288,162.28)  which gives a complete expenditure for these 785 articles of £1,576,252.42. The actual average cost is £2007.96.

* The Open Access Service also received many COAF only funded and unfunded papers during this period. The number of articles paid for does not include those made gold OA due to the Springer Compact as this would throw out the average APC value.

Observations

In our report on expenditure for 2014 the average article APC was £1891. This means there has been a 6% increase in Cambridge University’s average spend on an APC since then. It should be noted that of the journals for which we most frequently process APCs, Nature Communication is the second most popular. This journal has an APC of £3,780 including VAT.

Datasets on Cambridge APC spend 2009-2016

Cambridge released the information about its 2014 APC spend for RCUK and COAF in March last year and intended to do a similar report for the spend in 2015, however a recent FOI request has prompted us to simply upload all of our data on APC spend into our repository for complete transparency. The list of datasets now available is below.

1. Report presented to Research Councils UK for article processing charges managed by the University of Cambridge, 2014-2015

2. Report presented to the Charity Open Access Fund for article processing charges managed by the University of Cambridge, 2015-2016

3. Report presented to the Charity Open Access Fund for article processing charges managed by the University of Cambridge, 2014-2015

4. Report presented to Jisc for article processing charges managed by the University of Cambridge, 2014

5. Open access publication data for the management of the Higher Education Funding Council for England, Research Councils UK, Charities Open Access Fund and Wellcome Trust open access policies at the University of Cambridge, 2014-2016

Note: In October 2014 we started using a new system for recording submissions. This has allowed us to obtain more detailed information and allow multiple users to interact with the system. Until December 2015 our financial information was recorded in the spreadsheet below. There is overlap between reports 5. and 6. for the period 24 October and 31 December 2015.  As of January 2016, all data is being collected in the one place.

6. Open access publication data for the management of Research Councils UK, Charities Open Access Fund and Wellcome Trust article processing charges at the Office of Scholarly Communication, 2013-2015

Note: In 2013 the Open Access Service began and took responsibility for the new RCUK fund, and was transferred responsibility for the new Charities Open Access Fund (COAF). At this time the team were recording when an article was fully Wellcome Trust funded, even though the Wellcome Trust funding is a component of COAF.

7. Open access publication data for the management of Wellcome Trust article processing charges from the School of Biological Sciences, 2009-2014

Note: Management of the funds to support open access publishing has changed over the past seven years. Before the RCUK open access policy came into force in 2013, the Wellcome Trust funds were managed by the School of Biological Sciences.

Published 14 September 2016
Written by Dr Danny Kingsley & Dr Arthur Smith
Creative Commons License

Making the connection: research data network workshop

During International Data Week 2016, the Office of Scholarly Communication is celebrating with a series of blog posts about data. The first post was a summary of an event we held in July. This post looks at the challenges associated with financially supporting RDM training.

corpus-main-hallFollowing the success of hosting the Data Dialogue: Barriers to Sharing event  in July we were delighted to welcome the Research Data Management (RDM) community to Cambridge for the second Jisc research data network workshop. The event was held in Corpus Christi College with meals held in the historical dining room. (Image: Corpus Christi )

RDM services in the UK are maturing and efforts are increasingly focused on connecting disparate systems, standardising practices and making platforms more usable for researchers. This is also reflected in the recent Concordat on Research Data which links the existing statements from funders and government, providing a more unified message for researchers.

The practical work of connecting the different systems involved in RDM is being led by the Jisc Research Data Shared Services project which aims to share the cost of developing services across the UK Higher Education sector. As one of the pilot institutions we were keen to see what progress has been made and find out how the first test systems will work. On a personal note it was great to see that the pilot will attempt to address much of the functionality researchers request but that we are currently unable to fully provide, including detailed reporting on research data, links between the repository and other systems, and a more dynamic data display.

Context for these attempts to link, standardise and improve RDM systems was provided in the excellent keynote by Dr Danny Kingsley, head of the Office of Scholarly Communication at Cambridge, reminding us about the broader need to overhaul the reward systems in scholarly communications. Danny drew on the Open Research blogposts published over the summer to highlight some of the key problems in scholarly communications: hyperauthorship, peer review, flawed reward systems, and, most relevantly for data, replication and retraction. Sharing data will alleviate some of these issues but, as Danny pointed out, this will frequently not be possible unless data has been appropriately managed across the research lifecycle. So whilst trying to standardise metadata profiles may seem irrelevant to many researchers it is all part of this wider movement to reform scholarly communication.

Making metadata work

Metadata models will underpin any attempts to connect repositories, preservation systems, Current Research Information Systems (CRIS), and any other systems dealing with research data. Metadata presents a major challenge both in terms of capturing the wide variety of disciplinary models and needs, and in persuading researchers to provide enough metadata to make preservation possible without putting them off sharing their research data. Dom Fripp and Nicky Ferguson are working on developing a core metadata profile for the UK Research Data Discovery Service. They spoke about their work on developing a community-driven metadata standard to address these problems. For those interested (and Git-Hub literate) the project is available here.

They are drawing on national and international standards, such as the Portland Common Data Model, trying to build on existing work to create a standard which will work for the Shared Services model. The proposed standard will have gold, silver and bronze levels of metadata and will attempt to reward researchers for providing more metadata. This is particularly important as the evidence from Dom and Nicky’s discussion with researchers is that many researchers want others to provide lots of metadata but are reluctant to do the same themselves.

We have had some success with researchers filling in voluntary metadata fields for our repository, Apollo, but this seems to depend to a large extent on how aware researchers are of the role of metadata, something which chimes with Dom and Nicky’s findings. Those creating metadata are often unaware of the implications of how they fill in fields, so creating consistency across teams, let alone disciplines and institutions can be a struggle. Any Cambridge researchers who wish to contribute to this metadata standard can sign up to a workshop with Jisc in Cambridge on 3rd October.

Planning for the long-term

A shared metadata standard will assist with connecting systems and reducing researchers’ workload but if replicability, a key problem in scholarly communications, is going to be possible digital preservation of research data needs to be addressed. Jenny Mitcham from the University of York presented the work she has been undertaking alongside colleagues from the University of Hull on using Archivematica for preserving research data and linking it to pre-existing systems (more information can be found on their blog.)

Jenny highlighted the difficulties they encountered getting timely engagement from both internal stakeholders and external contractors, as well as linking multiple systems with different data models, again underlining the need for high quality and interoperable metadata. Despite these difficulties they have made progress on linking these systems and in the process have been able to look into the wide variety of file formats currently in use at York. This has lead to conversations with the National Archive about improving the coverage of research file formats in PRONOM (a registry of file formats for preservation purposes), work which will be extremely useful for the Shared Services pilot.

In many ways the project at York and Hull felt like a precursor to the Shared Services pilot; highlighting both the potential problems in working with a wide range of stakeholders and systems, as well as the massive benefits possible from pooling our collective knowledge and resources to tackle the technical challenges which remain in RDM.

Published 14 September 2016
Written by Rosie Higman
Creative Commons License

Beyond compliance – dialogue on barriers to data sharing

Welcome to International Data Week. The Office of Scholarly Communication is celebrating with a series of blog posts about data, starting with a summary of an event we held in July.

JME_0629.jpgOn 29 July 2016 the Cambridge Research Data Team joined forces with the Science and Engineering South Consortium to organise a one day conference at the Murray Edwards College to gather researchers and practitioners for a discussion about the existing barriers to data sharing. The whole aim of the event was to move beyond compliance with funders’ policies. We hoped that the community was ready to change the focus of data sharing discussions from whether it is worth sharing or not towards more mature discussions about the benefits and limitations of data sharing.

What are the barriers?

So what are the barriers to effective sharing of research data? There were three main barriers identified, all somewhat related to each other: poorly described data, insufficient data discoverability and difficulties with sharing personal/sensitive data. All of these problems arise from the fact that research data does not always shared in accordance to FAIR principles: that data is Findable, Accessible, Interoperable and Re-usable.

Poorly described data

The event started with an inspiring keynote talk from Dr Nicole Janz from the Department of Sociology at the University of Cambridge: “Transparency in Social Science Research & Teaching”. Nicole regularly runs replication workshops at Cambridge, where students select published research papers and they work hard for several weeks to reproduce the published findings. The purpose of these workshop is to allow students to learn by experience on what is important in making their own work transparent and reproducible to others.

Very often students fail to reproduce the results. Frequently, the reasons for failures are insufficient methodology available, or simply the fact that key datasets were not made available. Students learn that in order to make research reproducible, one not only needs to make the raw data files available, but that the data needs to be shared with the source code used to transform it and with written down methodology of the process, ideally in a README file. While doing replication studies, students also learn about the five selfish benefits of good data management and sharing: data disasters are avoided, it is easier to write up papers from well-managed data, transparent approach to sharing makes the work more convincing to reviewers, the continuity of research is possible and researchers can build their reputation for being transparent. As a tip for researchers, Nicole suggested always asking a colleague to try to reproduce the findings before submitting a paper for peer-review.

The problem of insufficient data description/availability was also discussed during the first case study talk by Dr Kai Ruggeri from the Department of Psychology, University of Cambridge. Kai reflected on his work on the assessment of happiness and wellbeing across many European countries, which was part of the ESRC Secondary Data Analysis Initiative. Kai re-iterated that missing data make the analysis complicated and sometimes prevent one from being able to make effective policy recommendations. Kai also stressed that frequently the choice of baseline for data analysis can affect the final results. Therefore, proper description of methodology and approaches taken is key for making research reproducible.

Insufficient data discoverability

JME_0665We also heard several speakers describing problems with data discoverability. Fiona Nielsen founded Repositive – a platform for finding human genomic data. Fiona founded the platform out of frustration that genomic data was so difficult to find and access. Proliferation of data repositories made it very hard for researchers to actually find what they need.

IMG_SearchingForData_20160911Fiona started with doing a quick poll among the audience: how do researchers look for data? It turned out that most researchers find data by doing a literature research or by googling for it. This is not surprising – there is no search engine enabling looking for information simultaneously across the multiple repositories where the data is available. To make it even more complicated, Fiona reported that in 2015 80PB of human genomic data was generated. Unfortunately, only 0.5PB of human genomic data was made available in a data repository.

So how can researchers find the other datasets, which are not made available in public repositories? Repositive is a platform harvesting metadata from several repositories hosting human genomic data and providing a search engine allowing researchers to simultaneously look for datasets shared in all of them. Additionally, researchers who cannot share their research data via a public repository (for example, due to lack of participants’ consent for sharing), can at least create a metadata record about the data – to let others know that the data exist and to provide them with information on data access procedure.

The problem of data discoverability is however not only related to people’s awareness that datasets exists. Sometimes, especially in the case of complex biological data with a vast amount of variables, it can be difficult to find the right information inside the dataset. In an excellent lightening talk, Jullie Sullivan from the University of Cambridge described InterMine –platform to make biological data easily searchable (‘mineable’). Anyone can simply upload their data onto the platform to make it searchable and discoverable. One example of the platform’s use is FlyMine – database where researchers looking for results of experiments conducted on fruit fly can easily find and share information.

Difficulties with sharing personal/sensitive data

The last barrier to sharing that we discussed was related to sharing personal/sensitive research data. This barrier is perhaps the most difficult one to overcome, but here again the conference participants came up with some excellent solutions. First one came from the keynote speech by Louise Corti – with a talk with a very uplifting title: “Personal not painful: Practical and Motivating Experiences in Data Sharing”.

Louise based her talk on the long experience of the UK Data Service with providing managed access to data containing some forms of confidential/restricted information. Apart from being able to host datasets which can be made openly available, the UKDS can also provide two other types of access: safeguarded access, where data requestors need to register before downloading the data, and controlled data, where requests for data are considered on a case by case basis.

At the outset of the research project, researchers discuss their research proposals with the UKDS, including any potential limitations to data sharing. It is at this stage – at the outset of the research project, that the decision is made on the type of access that will be required for the data to be successfully shared. All processes of project management and data handling, such as data anonymisation and collection of informed consent forms from study participants, are then carried in adherence to that decision. The UKDS also offers protocols clarifying what is going to happen with research data once they are deposited with the repository. The use of standard licences for sharing make the governance of data access much more transparent and easy to understand, both from the perspective of data depositors and data re-users.

Louise stressed that transparency and willingness to discuss problems is key for mutual respect and understanding between data producers, data re-users and data curators. Sometimes unnecessary misunderstandings make data sharing difficult, when it does not need to be. Louise mentioned that researchers often confuse ‘sensitive topic’ with ‘sensitive data’ and referred to a success case study where, by working directly with researchers, UKDS managed to share a dataset about sedation at the end of life. The subject of study was sensitive, but because the data was collected and managed with the view of sharing at the end of the project, the dataset itself was not sensitive and was suitable for sharing.

As Louise said “data sharing relies on trust that data curators will treat it ethically and with respect” and open communication is key to build and maintain this trust.

So did it work?

JME_0698The purpose of this event was to engage the community in discussions about the existing limitation to data sharing. Did we succeed? Did we manage to engage the community? Judging by the fact that we have received twenty high quality abstract applications from researchers across various disciplines for only five available case study speaking slots (it was so difficult to shortlist the top five ones!) and also because the venue was full – with around eighty attendees from Cambridge and other institutions, I think that the objective was pretty well met.

Additionally, the panel discussion was led by researchers and involved fifty eight active users on the Sli.do platform for questions to panellists. There were also questions asked outside of Sli.do platform. So overall I feel that the event was a great success and it was truly fantastic to be part of it and to see the degree of participant involvement in data sharing.

Another observation is also the great progress of the research community in Cambridge in the area of sharing: we have successfully moved away from discussions whether research data is worth sharing to how to make data sharing more FAIR.

It seems that our intense advocacy, and the effort of speaking with over 1,800 academics from across the campus since January 2015 paid off and we have indeed managed to build an engaged research data management community.

Read (and see!) more:

Published 12 September 2016
Written by Dr Marta Teperek
Creative Commons License

Could Open Research benefit Cambridge University researchers?

This blog is part of the recent series about Open Research and reports on a discussion with Cambridge researchers  held on 8 June 2016 in the Department of Engineering. Extended notes from the meeting and slides are available at the Cambridge University Research Repository. This report is written by  Lauren Cadwallader, Joanna Jasiewicz and Marta Teperek (listed alphabetically by surname).

At the Office of Scholarly Communication we have been thinking for a while about Open Research ideas and about moving beyond mere compliance with funders’ policies on Open Access and research data sharing. We thought that the time has come to ask our researchers what they thought about opening up the research process and sharing more: not only publications and research data, but also protocols, methods, source code, theses and all the other elements of research. Would they consider this beneficial?

Working together with researchers – democratic approach to problem-solving

To get an initial idea of the expectations of the research community in Cambridge, we organised an open discussion hosted at the Department of Engineering. Anyone registering was asked three questions:

  • What frustrates you about the research process as it is?
  • Could you propose a solution that could solve that problem?
  • Would you be willing to speak about your ideas publicly?

20160608_163000Interestingly, around fifty people registered to take part in the discussion and almost all of them contributed very thought-provoking problems and appealing solutions. To our surprise, half of the people expressed their will to speak publicly about their ideas. This shaped our discussion on the day.

So what do researchers think about Open Research? Not surprisingly, we started from an animated discussion about unfair reward systems in academia.

Flawed metrics

A well-worn complaint: the only thing that counts in academia is publication in a high impact journal. As a result, early career researchers have no motivation to share their data and to publish their work in open access journals, which can sometimes have lower impact factors. Additionally, metrics based on the whole journal do not reflect the importance of the research described: what is needed is article-level impact measurements. But it is difficult to solve this systemic problem because any new journal which wishes to introduce a new metrics system has no journal-level impact factor to start with, and therefore researchers do not want to publish in it.

Reproducibility crisis: where quantity, not quality, matters

Researchers also complained that the volume of produced research is higher and higher in terms of quantity and science seems to have entered an ‘era of quantity’. They raised the concern that the quantity matters more than the quality of research. Only the fast and loud research gets rewarded (because it is published in high impact factor journals), and the slow and careful seems to be valued less. Additionally, researchers are under pressure to publish and they often report what they want to see, and not what the data really shows. This approach has led to the reproducibility crisis and lack of trust among researchers.

Funders should promote and reward reproducible research

The participants had some good ideas for how to solve these problems. One of the most compelling suggestions was that perhaps funding should go not only to novel research (as it seems to be at the moment), but also to people who want to reproduce existing research. Additionally, reproducible research itself should be rewarded. Funders could offer grant renewal schemes for researchers whose research is reproducible.

Institutions should hire academics committed to open research

Another suggestion was to incentivise reward systems other than journal impact factor metrics. Someone proposed that institutions should not only teach the next generation of researchers how to do reproducible research, but also embed reproducibility of research as an employment criteria. Commitment to Open Research could be an essential requirement in job description. Applicants could be asked at the recruitment stage how they achieve the goals of Open Research. LMU University in Munich had recently included such a statement in a job description for a professor of social psychology (see the original job description here and a commentary here).

Academia feeding money to exploitative publishers

Researchers were also frustrated by exploitative publishers. The big four publishers (Elsevier, Wiley, Springer and Informa) have a typical annual profit margin of 37%. Articles are donated to the publishers for free by the academics, and reviewed by other academics, also free of charge. Additionally, noted one of the participants, academics also act as journal editors, which they also do for free.

[*A comment about this statement was made on 15 August 2017 noting that some editors do get paid. While the participant’s comment stands as a record of what was said, we acknowledge that this is not an entirely accurate statement.]

In addition to this, publishers take away the copyright from the authors. As a possible solution to the latter, someone suggested that universities should adopt institutional licences on scholarly publishing (similar to the Harvard licence) which could protect the rights of their authors

Pre-print services – the future of publishing?

Could Open Research aid the publishing crisis? Novel and more open ways of publishing can certainly add value to the process. The researchers discussed the benefits of sharing pre-print papers on platforms like arXiv and bioRxiv. These services allow people to share manuscripts before publication (or acceptance by a journal). In physics, maths and computational sciences it is common to upload manuscripts even before submitting the manuscript to a journal in order to get feedback from the community and have the chance to improve the manuscript.

bioRxiv, the life sciences equivalent of arXiv, started relatively recently. One of our researchers mentioned that he was initially worried that uploading manuscripts into bioRxiv might jeopardise his career as a young researcher. However, he then saw a pre-print manuscript describing research similar to his published on bioRxiv. He was shocked when he saw how the community helped to change that manuscript and to improve it. He has since shared a lot of his manuscripts on bioRxiv and as his colleague pointed out, this has ‘never hurt him’. To the contrary, he suggested that using pre-print services promotes one’s research: it allows the author to get the work into the community very early and to get feedback. And peers will always value good quality research and the value and recognition among colleagues will come back to the author and pay back eventually.

Additionally, someone from the audience suggested that publishing work in pre-print services provides a time-stamp for researchers and helps to ensure that ideas will not be scooped by anyone – researchers are free to share their research whenever they wish and as fast they wish.

Publishers should invest money in improving science – wishful thinking?

It was also proposed that instead of exploiting academics, publishers could play an important role in improving the research process. One participant proposed a couple of simple mechanisms that could be implemented by publishers to improve the quality of research data shared:

  • Employment of in-house data experts: bioinfomaticians or data scientists, who could judge whether supporting data is of a good enough quality
  • Ensure that there is at least one bioinfomatician/data scientist on the reviewing panel for a paper
  • Ask for the data to be deposited in a public, discipline-specific repository, which would ensure quality control of the data and adherence to data standards.
  • Ask for the source code and detailed methods to be made available as well.

Quick win: minimum requirements for making shared data useful

A requirement that, as a minimum, three key elements should be made available with publications – the raw data, the source code and the methods – seems to be a quick win solution to make research data more re-usable. Raw data is necessary as it allows users to check if the data is of a good quality overall, while publishing code is important to re-run the analysis and methods need to be detailed enough to allow other researchers to understand all the processes involved in data processing. An excellent case study example comes from Daniel MacArthur who has described how to reproduce all the figures in his paper and has shared the supporting code as well.

It was also suggested that the Office of Scholarly Communication could implement some simple quality control measures to ensure that research data supporting publications is shared. As a minimum the Office could check the following:

  • Is there a data statement in the publication?
  • If there is a statement – is there a link to the data?
  • Does the link work?

This is definitely a very useful suggestion from our research community and in fact we have already taken this feedback aboard and started checking for data citations in Cambridge publications.

Shortage of skills: effective data sharing is not easy

The discussion about the importance of data sharing led to reflections that effective data sharing is not always easy. A bioinformatician complained that datasets that she had tried to re-use did not satisfy the criteria of reproducibility, nor re-usability. Most of the time there was not enough metadata available to successfully use the data. There is some data shared, there is the publication, but the description is insufficient to understand the whole research process: the miracle, or the big discovery, happens somewhere in the middle.

Open Research in practice: training required

Attendees agreed that it requires effort and skills to make research open, re-usable and discoverable by others. More training is needed to ensure that researchers are equipped with skills to allow them to properly use the internet to disseminate their research, as well as with skills allowing them to effectively manage their research data. It is clear that discipline-specific training and guidance around how to manage research data effectively and how to practise open research is desired by Cambridge researchers.

Nudging researchers towards better data management practice

Many researchers have heard or experienced first-hand horror stories of having to follow up on somebody else’s project, where it was not possible to make any sense of the research data due to lack of documentation and processes. This leads to a lot of time wasted in every research group. Research data need to be properly documented and maintained to ensure research integrity and research continuity. One easy solution is to nudge researchers towards better research data management practice could be formalised data management requirements. Perhaps as a minimum, every researchers should have a lab book to document research procedures.

The time is now: stop hypocrisy

Finally, there was a suggestion that everyone should take the lead in encouraging Open Research. The simplest way to start is to stop being what has been described as a hypocrite and submit articles to journals which are fully Open Access. This should be accompanied by making one’s reviews openly available whenever possible. All publications should be accompanied by supporting research data and researchers should ensure that they evaluate individual research papers and that their judgement is not biased by the impact factor of the journal.

Need for greater awareness and interest in publishing

One of the Open Access advocates present at the meeting stated that most researchers are completely unaware of who are the exploitative and ethical publishers and the differences between them. Researchers typically do not directly pay the exploitative publishers and are therefore not interested in looking at the bigger picture of sustainability of scholarly publishing. This is clearly an area when more training and advocacy can help and the Office of Scholarly Communication is actively involved in raising awareness in Open Access. However, while it is nice to preach in a room of converts, how do we get other researchers involved in Open Access? How should we reach out to those who can’t be bothered to come to a discussion like the one we had? This is the area where anyone who understands the benefits Open Access has a job to do.

Next steps

We are extremely grateful to everyone who came to the event and shared their frustrations and ideas on how to solve some problems. We noted all the ideas on post it notes – the number of notes at the end of the discussion was impressive, an indication of how creative the participants were in just 90 minutes. It was a very productive meeting and we wish to thank all the participants for their time and effort.

20160608_160721

We think that by acting collaboratively and supporting good ideas we can achieve a lot. As an inspiration, McGill University’s Montreal Neurological Institute and Hospital (the Neuro) in Canada have recently adopted a policy on Open Research: over the next five years all results, publications and data will be free to access by everyone.

Follow up

If you would like to host similar discussions directly in your departments/institutes, please get in touch with us at info@osc.cam.ac.uk – we would be delighted to come over and hear from researchers in your discipline.

In the meantime, if you have any additional ideas that you wish to contribute, please send them to us. Everyone who is interested in being informed about the progress here is encouraged to sign up for a mailing distribution list here.

Extended notes from the meeting and slides are available at the Cambridge University Research Repository. We are particularly grateful to Avazeh Ghanbarian, Corina Logan, Ralitsa Madsen, Jenny Molloy, Ross Mounce and Alasdair Russell (listed alphabetically by surname) for agreeing to publicly speak at the event.

Published 3 August 2016
Written by Lauren Cadwallader, Joanna Jasiewicz and Marta Teperek
Creative Commons License