Who is paying for hybrid?

In our related blog ‘Hybrid Open Access  – an analysis‘ we explored the origins and issues with hybrid open access. Here we describe what funders are allowing or not in relation to payments for hybrid Open Access APCs.

Funding agencies and hybrid

Of the 179 Open Access funds listed in the Open Access Directory, 99 (55%) do not allow hybrid publishing; 78 (44%) do, or do not specify. The two remaining funds (1%) allow hybrid but either discourage it or require that the publisher have an offsetting scheme in place. This shows a strong move away from hybrid since 2014, when only 39% of funds rejected hybrid – a rejection of hybrid is now the majority position.

What’s more, these anti-hybrid funders now include some major organisations, particularly in Europe. The EU FP7 post-grant pilot, for example, is only open to authors publishing in fully Open Access journals, and the Netherlands Organization for Scientific Research (NWO) has considered hybrid ineligible for funds since December 2015.

According to a news story in Nature in January this year, the Norwegian Research Council and the German Research Foundation both pay Open Access fees for researchers but do not permit the payment of hybrid costs. The Austrian Science Fund has capped Open Access payments at a certain level; if researchers want to publish in more expensive journals (often the hybrids), they must find the extra cash themselves.

In 2013 Science Europe declared in a position statement that:

The Science Europe member organisations […] stress that the hybrid model, as currently defined and implemented by publishers, is not a working and viable pathway to Open Access. Any model for transition to Open Access supported by Science Europe member organisations must prevent ‘double dipping’ and increase cost transparency.

UK funders’ position on hybrid

The Wellcome Trust, while not yet abandoning hybrid entirely, voiced considerable wariness in its 2014-15 report, and has warned that stricter action will follow if there is not an improvement in publisher behaviour:

We believe declaring that Wellcome funds cannot be used to pay for hybrid OA is too blunt an instrument, unfairly penalising those publishers which provide a good service at a reasonable price, and that it would slow down the transition to a fully OA world – the position we ultimately want to get to.

However, doing nothing is no longer a valid option.  If hybrid publishers are unable to commit to the Wellcome Trust’s set of requirements and do not significantly improve the quality of the service, then classifying those hybrid journals as “non-compliant” will be an inevitable next step.

In 2015 RCUK published an independent review into the implementation of their Open Access policy which, while notably less combative on the issue of hybrid, nevertheless noted the expensiveness of the option and suggested potential future action:

The panel noted that average APCs for articles published in hybrid journals were consistently more expensive than in fully open access journals (despite the fact that hybrid journals still enjoyed a revenue stream through subscriptions). The panel recommends that RCUK continues to monitor this and if these costs show no sign of being responsive to market forces, then a future review should explore what steps RCUK could take to make this market more effective.

In the Universities UK Open Access Coordination Group’s report “Open access to research publications – Independent advice” the author, Professor Adam Tickell noted:

An alternative approach would be to consider whether funding Gold Open Access in Hybrid Journals where there are no equivalent offsets in subscription costs is a good use of public funds. During the course of working on this report, I met with the Publishers Association and Elsevier and I do not believe that the major publishers would find this slight change of course challenging.

Library funds and hybrid

In January this year the Canadian Association of Research Libraries (CARL) published Library Open Access Funds in Canada: review and recommendations. Amongst the summary of fund management recommendations was  ‘do not  fund hybrid journals‘.

SPARC maintains an Open Access Campus Funds page, which provides advice. The document “Campus-based open-access publishing funds: a practical guide to design and implementation” contains a whole section on deciding whether to support hybrid, noting “Many institutions that have functioning Open-access Funds have indicated that the toughest decision they made concerned hybrid journal eligibility”.

US library-run funds

Zuniga, H. & Hoffecker, L. (2016). Managing an Open Access Fund: Tips from the Trenches and Questions for the Future. Journal of Copyright in Education and Librarianship, 1(1), 1-13 discusses the thinking behind a library-run Open Access fund at University of Colorado Health Sciences Library and specifies that funding will only be available for fully Open Access journals and not hybrid ones.

A recent discussion on one of the lists (which is dominated by American institutions) about library funds for open access revealed the very strong preference to support only fully Open Access journals. Of the responses from the US libraries, nine funds did not support hybrid and two did under particular circumstances. The US is not subject to the gold Open Access policies that the UK is:

  • University of Rhode Island only supports “articles published in fully open access, peer-reviewed scholarly journals” that are listed in the DOAJ with its Open Access Fund
  • Texas A&M University Libraries’ Open Access to Knowledge Fund (OAKFund) notes “”Hybrid” Open Access publication venues and publication venues with delayed Open Access models are ineligible.”
  • The University of Pittsburgh’s Open Access Fee Author Policy states “Journals with a hybrid open-access model or delayed open-access model are not eligible.”
  • The One University Open Access (OA) AuthorFund at the University of Kansas supports only publication in “an entirely open access journal. Journals with a hybrid open-access model or delayed open-access model are not eligible”. A definition of hybrid journal is provided. – 2015 article in JLSC “Campus Open Access Funds: experiences of the KU “One University” Open access author fund”.
  • Cornell University’s Open Access Publication Fund does not mention hybrid specifically but the wording implies the fund supports only fully Open Access journals, noting “Since open access publishers do not charge subscription or other access fees, they must cover their operating expenses through other sources.”
  • Concordia University’s Open Access Author Fund states “the article must be published in a fully open access journal. Traditional subscription-based or ‘hybrid’ journals that offer an open access option for a fee are not eligible.”
  • University of Oklahoma’s Open Access (OA) Subvention Fund Policy refers to “true open access journals”, noting “Articles with a hybrid or delayed OA model are not eligible through this fund”
  • The information about University of California San Francisco’s Open Access Publishing Fund includes a section about why it does not support hybrid
  • Northwestern University’s Open Access Fund describes an acceptable open access journal as a “journal published in a fully open access format based on a published schedule of article processing fees”

That said, there were a couple that are considering support for hybrid:

  • Wayne State University’s Scholars Cooperative Open Access Fund states “Hybrid open access arrangements (“paid open access” or “open choice”) may be considered on a case-by-case basis”.
  • Wake Forest University Open Access Fund does support hybrid, but the cost for all open access is split three ways between the Library the Research Office and the author.
UK library-run funds

In November last year the UCL, Newcastle and Nottingham Universities published the results of a survey with Jisc: “Institutional policies on the use of Open Access Funds“. The report noted that of the respondents 18 institutions in the UK had a central institutional fund (not provided by RCUK/COAF). The report noted there were different approaches to using these central funds. At the time four institutions paid for papers in fully Open Access journals only; four paid for papers in both fully Open Access and hybrid journals, without encouraging authors in favour of Green or Gold; and five institutions encourage authors to choose Green where possible.

In response to a list query in October 2016 (which is not a comprehensive survey by any means), there was a mixture of arrangements in the UK library-run funds. Four funds did not support hybrid, four did, and there were three that supported them in particular circumstances.

Some UK funds are primarily non-hybrid with a small number of exceptions.

  • University College London has a fund which provides limited funds “for other UCL corresponding authors who are full (not honorary or visiting) members of staff or students where the funder does not cover open access charges”. This fund generally only pays for papers in fully OA journals. When it comes to hybrids the policy is very much to recommend Green, but the fund does occasionally pay for papers in hybrid journals “where the author makes a case for it”.
  • The University of Bath has a Bath open access fund  for journals that operate “a ‘Gold’ or paid Open Access model only AND the journal is a Q1 title as measured by Journal Citation Reports or SciMago Journal Indicators”. Note that this fund will support hybrid by exception, with Associate Dean agreement.
  • Lancaster University has a small fund available with strict criteria for when it can be used.  The research paper must both be likely to be rated as 4* in the next REF and be the most appropriate place to publish and does not offer a compliant green route or is an open access only journal. Applications need approval from the Heads of Department and Associate Dean for Research.

Other funds do not distinguish between hybrid and fully OA journals:

  • King’s College London are in the second pilot year of an Open Scholarship Fund which currently does not distinguish between hybrid and full open access journals – but this may be considered if the funds are exhausted.
  • Northumbria University Newcastle has an institutional Open Access fund to cover APCs in both fully gold and hybrid journals.
  • Liverpool University has an institutional open access fund here that has very minimal criteria (CC BY, no retrospective OA, no page or colour charges) that pays both hybrid and fully OA APCs. The fund is reviewed every six months.
  • Queen Mary University will be starting to offer a small institutional fund this year to cover non funded research which will support hybrid

There are some UK institutions where no central fund exists but Departments or Faculties have established their own funds with their own rules.

Conclusions

The increase in funds that do not allow payment for hybrid since 2014 indicates that increasingly the gloss has come off hybrid. Originally considered to be a transition method towards fully Open Access journals, the lack of movement towards this outcome has meant a tightening by funders on what can be spent on hybrid. It will be interesting to revisit this in another two years’ time.

Published 24 October 2016
Written by Dr Danny Kingsley and Dr Philip Boyes 
Creative Commons License

Hybrid open access – an analysis

Welcome to Open Access Week 2016. The Office of Scholarly Communication at Cambridge is celebrating with a series of blog posts, announcements and events. In today’s blog posts we revisit the issue of paying for hybrid open access. We have also published a related post “Who is paying for hybrid?” listing funder policies on hybrid.

Recent years have seen a proliferation of funder open access mandates, the terms of which can differ markedly, adding to the confusion of an already complex area. The Registry of Open Access Repository Mandates and Policies (ROARMAP) lists 80 funders with open access requirements, and the list continues to grow.

Within the UK, policies fall into three broad categories: those that mandate green Open Access without paying a fee, such as the HEFCE policy; those that prefer gold but make no additional funds available, such as the NIHR policy, and those that have a preference for gold and offer block grants to institutions to help cover the associated costs, such as the Research Councils UK (RCUK) and Charities Open Access Fund (COAF) policies.

Accompanying this expansion of mandates, unsurprisingly, has been an increase in the amount being spent to support Open Access. The Open Access Directory lists 179 funds for OA journal articles worldwide, compared with 81 in early 2014.

All this brings into sharper relief the question of how open access funds support hybrid publishing. But first a quick history lesson.

Hybrid origins

Hybrid journals provide open access to specific articles where an Article Processing Charge has been paid in an otherwise subscription journal. A few learned societies offered hybrid options in the early 2000s. Hybrid open access options were first offered by large publishers in 2004 with Springer’s Open Choice product charging USD3000 per article. This price has not changed in the past 12 years. In the UK the Springer Compact now pays for hybrid under a different model.

Wiley Online Open’s trial began the same year, charging USD2500. Today the price ranges from USD1,500 – 5,200. Oxford Open launched in 2005, and in 2006 Elsevier Open Access and Sage Choice began. In 2007, Taylor & Francis Open Select, Cambridge Open and Nature Publishing Group’s open access offering began.

The uptake of hybrid began slowly. It is very difficult to obtain statistics on what percentage of journals have hybrid Open Access content but in his 2012 analysis The hybrid model for open access publication of scholarly articles – a failed experiment?, (open access version here ) Bo-Christer Bjork found the number of hybrid journals had doubled in the previous couple of years to over 4,300, and the number of such articles was around 12,000 in 2011. This represented a small proportion of eligible authors (1-2 %).

That analysis was published the same year as the Finch Report which recommended a gold path to Open Access. The resulting RCUK Open Access Policy and RCUK Block Grants to fund Open Access APCs has dramatically increased the  expenditure on hybrid in the UK since 2013. According to a report published in 2015, “the UK’s profile of OA take-up is significantly different from the global averages: its use of OA in hybrid journals and of delayed OA journals is more than twice the world average in both cases, while its take-up of fully OA journals with no APC (Gold-no APC) is less than half the world average and falling.”

At Cambridge University we have spent literally millions of pounds on hybrid Open Access – which constitutes approximately 85% of our total APC spend. This is a higher percentage than estimates across the country, which are a 76% spend on hybrid Open Access.

Double dipping

Hybrid represents a second income stream to publishers and has raised questions about ‘double dipping’. Some publishers manage this by reducing the cost of subscriptions in proportion to the percentage of hybrid in a given journal, such as Nature Publishing Group. However ‘big deals’ for subscriptions can render this relatively ineffective, and the reduction is spread across all subscribers, regardless of who has paid the article processing charge. This means research intensive institutions (such as Cambridge) are contributing heavily to the system but not receiving a relative reduction.

To address this issue at a local level, several publishers have created offsetting arrangements, where discounts or refunds are provided in proportion to the contribution the institution has made in APC payments above subscriptions. However, each of these schemes operates differently and they can be complicated to administer, or have other preconditions such making large prepayments to publishers.

The biggest problem from an implementation perspective, however, is that they are by no means universal. By far the biggest publisher, Elsevier, for example, offers no form of offsetting at all, although they nevertheless assert that they do not double dip. The result is that in very many cases, institutions and authors continue to have to pay twice for material in hybrid journals, swelling publisher coffers at the expense of research funding.

Very expensive

One of the problems with hybrid is that even ignoring the added cost of subscriptions to the non Open Access material in those journals, hybrid Open Access charges are more expensive than those for fully Open Access journals.

In March last year both the Wellcome Trust and the RCUK undertook a review of their Open Access policies. The Reckoning: An Analysis of Wellcome Trust Open Access Spend 2013 – 14  noted: “The average APC levied by hybrid journals is 64% higher than the average APC charged by a fully OA title”.  In Wellcome’s data, the average APC for a hybrid article in 2014-15 was £2104, compared with only £1396 for fully OA journals. Worryingly, the data showed that fully OA APC costs had risen more than their hybrid counterparts since the previous year.

Similarly in the Research Councils UK 2014 Independent Review of Implementation the observation was that article processing charges for hybrid Open Access were “significantly more expensive” than fully OA journals, “despite the fact that hybrid journals still enjoyed a revenue stream through subscriptions”.

A Max Planck Digital Library Open Access Policy White Paper published on 28 April 2015 noted that The Wellcome Trust had a significantly higher average APC cost than German, Austrian and SCOAP3 figures. This was because the Wellcome Trust pays for hybrid APCs, “which are not only much higher than most pure open access costs but are also widely considered not to reflect a true market value. In Germany and many other countries, hybrid APCs are excluded from the central funding schemes.”

A study undertaken last year considered APCs in the five-year period between 2010 and 2014 found the mean for fully-OA journals published by non-subscription publishers was£1,136 compared with £1,849 for hybrid journals. The same study also found that traditional subscription publishers are capturing most of the APC market. The top-10 publishers in terms of numbers of APCs received from participant institutions (who received 76% of the total APCs paid from the sample) “only included two fully-OA publishers (PLOS and BMC). The others were established publishers (Elsevier, Wiley, Springer and so on) who are mostly gaining APC income from hybrid journals.”

The 2014 report Developing an effective market for open access article processing charges was written for a consortium of research funders comprising Jisc, Research Libraries UK, Research Councils UK, the Wellcome Trust, the Austrian Science Fund, the Luxembourg National Research Fund and the Max Planck Institute for Gravitational Physics. The authors noted of the hybrid journal market that it is “highly dysfunctional, with very low uptake for most hybrid journals and a relatively uniform price in most cases without regard to factors such as discipline or impact“.

Value for money?

A second issue which has become apparent as open access mandates have expanded is the extent to which publishers – mostly of hybrid journals – do not deliver the Open Access option that has been paid for. In many cases, the ‘immediate’ Open Access for which an author or institution has paid an APC may take months or even years to be made Open Access; some articles are never made Open Access at all. Even when articles are made available, there is no guarantee that it will have the appropriate licence. It is by no means uncommon for articles to carry more restrictive licences than those requested, or for the appropriate licence to appear on a journal website while the PDF of the article itself bears only a publisher copyright notice and a prominent ‘All rights reserved’.

In March 2016 the Wellcome Trust published a report into compliance among its paid-for articles in 2014-15, concluding:

The good news is that we have seen an improvement in correct and programmatically identifiable licences (from 61% of papers in ’13-‘14, to 70% in ’14-‘15) and a similar increase in overall compliance from 61% to 70%.  The bad news, however, is that in 30% of cases we are not getting what we are paying for.

The source of this non-compliance was overwhelmingly hybrid journals, and the largest publishers were the worst offenders: in the Wellcome data, 31% of Elsevier hybrid articles (and 26% of their ‘fully OA’ articles!) were non-compliant, as were 54% of Wiley’s.

One might conclude, then, that hybrid Open Access represents a bad deal for funders and institutions, with poor service and double-dipping.

Other hybrid issues

To further complicate matters, some have argued that the open access/hybrid dichotomy is too stark. Some journals, particularly coming from learned societies, (e.g. Plant Physiology, from the American Society of Plant Biologists) make all articles open access after a certain period, but charge an optional APC to make them available sooner. This would generally be considered hybrid publishing, but could be seen as a rather different category from the majority of corporate hybrid journals, in which articles never become Open Access unless an APC is paid. There is a possibility that strict funder mandates against hybrid could close off such journals to researchers, exacerbating the anxieties regarding open access felt by many learned societies.

Where does this leave authors and institutions? It’s clear that the situation remains very much in flux. The problems that have existed with hybrid since the beginnings of Open Access are far from resolved, despite the expansion of journal offsetting schemes. Meanwhile, prices continue to rise and while many funders have taken the step of allowing their funds to be used only for fully Open Access journals, it is still a minority of the largest and most powerful funding bodies.

The result is confusion for researchers and an increased administrative burden for institutions, who have to manage and advise on a proliferation of divergent funder and publisher policies, as well as conducting regular and extremely resource-intensive compliance-checking of hybrid publications to ensure publishers have delivered what has been paid for. As numbers of Open Access publications increase, it is questionable how sustainable this will be.

Published 24 October 2016
Written by Dr Philip Boyes and Dr Danny Kingsley 
Creative Commons License

Request a copy: process and implementation

This blog post looks at a recent feature implemented in our repository called ‘Request a copy’ and discusses the process and management of the service. There is a related blog post which discusses the uptake and reaction to the facility.

As part of our recent upgrade to the University’s institutional repository (now renamed ‘Apollo‘), we implemented a new feature called ‘Request a copy’. ‘Request a copy’ operates on the principle of peer-to-peer sharing – if an item in Apollo is not yet available to the public, a repository user can ask the author for a copy of the item. Authors sharing copies of their work on an individual basis falls outside the publisher’s copyright restrictions; here, the repository is acting as a facilitator to a process which happens anyway – peer to peer sharing.

The main advantage of the ‘Request a copy’ feature is to open up the University’s most current research to a wider audience. Many of our users do not necessarily come from an academic background, or may be based within another discipline, or an institution where journal subscriptions are more limited. The repository is often their first port of call to find new research as it ranks highly in Google search results. We hope that these users will benefit from ‘Request a copy’ by being able to access new outputs early, at researchers’ discretion. Additionally, this may provide an added benefit to researchers by introducing new contacts and potential collaborations.

How it works

Screen Shot 2016-10-06 at 13.53.30Items in Apollo that are not yet accessible to the wider public are indicated by a padlock symbol that appears on the thumbnail image and filename link which users can usually click to download the file.

Reasons why the file may not yet be publicly available include:

  • Some publishers require that articles in repositories cannot be made available until they are published, or until a specified time after publication
  • We hold a number of digitised theses in the repository, and for some we have been unable to contact the author to secure permission to make their thesis available
  • Authors may choose to make their dataset available only once the related article is published

When a user clicks on a thumbnail or filename link containing a padlock, they are directed to the ‘Request a copy’ form. Here, they provide their name, email address and a message to the author. On clicking ‘Request copy’, an email is sent to the person who submitted the article, containing the user’s details. The recipient of this email then has the option to approve or deny the user’s request, to contact the user for more information, or (if they are not the author) to forward the request to the author.

How it really works

In practice, the process is slightly more complicated. For most of the content in the repository, the person who submitted an item will be a member of repository staff, rather than the item’s author. This means that for the most part, emails generated by the ‘Request a copy’ form were initially sent to members of the Office of Scholarly Communication team. In some cases, these requests were sent to people who have left the University, and we have had to query the system to retrieve these emails. As an interim measure, we have now directed all emails to support@repository.cam.ac.uk. These still need manual processing.

Theses

For theses where we have not received permission from the author to make them available, we forward requests to the University Library’s Digital Content Unit, who have traditionally provided digitised copies of theses at a charge of £65. We have  found however, that once information about this charge is communicated to the requester, very few (approximately 1%) actually complete the process of ordering a thesis copy.

We have been working with the Digital Content Unit on a trial where thesis copies were offered at £30, then £15. However, even at these cheaper prices, uptake remained low (it increased to 10%, but due to the small size of the sample, this only equated to two and three requests at each price point, and therefore may not be statistically significant). This indicates that the objection was to being charged at all, rather than to the particular amount. Work in this area remains ongoing to try and offer thesis copies as cheaply as possible to requesters, while allowing the Digital Content Unit to cover their costs.

Articles

If the request is for an article, we first need to check whether the article has actually been published and is already available Open Access. Although we endeavour to keep all our repository records up to date, unless we are informed that an article has been published, repository staff need to check each article for which publication is pending. This is a time-consuming manual process, and when we have a large backlog, sometimes it can take a while before an article is updated following publication.

If we found that the article has indeed been published and can be made Open Access, we amend the record, make the article available and email the requester to let them know they can now download the file directly from the repository.

On the other hand, if the article is still not published, or if it is under an embargo, we need to forward the request to the corresponding author(s). Sometimes their name(s) and email address(es) will be included within the article itself, and sometimes we have a record of who submitted the article via the Open Access upload form. However, if it is not clear from the article who the corresponding author is, or if their contact details are not included, and if the article was submitted by an administrator rather than one of the authors, we then need to search via the University’s Lookup service for the email addresses of any Cambridge authors, and search the internet for email addresses of any non-Cambridge authors, before we can forward on the request.

As a result, it can take repository staff up to 30 minutes to process an individual request. This is quicker if the article has been requested previously and the author’s contact details are already stored, but can take longer when we need to search. Sometimes, there is also repeat correspondence if the author has any queries, which adds to the total time in processing each request.

Amending our processes

Since introducing ‘Request a copy’, we have started collecting the email addresses of corresponding authors when an article is submitted, and we have commissioned a repository development company to ensure that ‘Request a copy’ emails can be sent directly to those authors for whom we have an email address – a feature that we are hoping to implement in the next few weeks.

However, if the author moves institution, their university email address will no longer be valid, and any requests for their work will again need to come via repository staff. One way to solve this would be to ask for an external (non-university) email address for the corresponding author at the point where they upload the article to the repository. However, this would introduce an extra step to an already onerous process and may act as a further barrier to authors submitting articles in the first place.

Generally, ‘Request a copy’ is a great idea and provides many benefits to the research community and beyond. But the implementation of this service has been challenging. The amount of time taken by each request has meant that some staff members have been redeployed from their usual jobs to facilitate these requests, which also has an impact on the backlog of articles in the repository that need to be checked in case they have since been published. If an article is published but still in the backlog (and therefore not publicly available in the repository), unnecessary requests for it could result in a reputational issue for the Office of Scholarly Communication and the University.

We will continue to look at our processes over the coming academic year, to see how we can improve our current workflows, and identify and resolve any issues, as well as determining where best to focus any further development work. In the related blog post on ‘Request a copy’, I’ll be talking about usage statistics for the service so far, some more unexpected use cases we have encountered, and feedback from our users that will help us to shape the service into the future.

Published 7 October 2016
Written by Sarah Middle
Creative Commons License

Request a copy: uptake and user experience

This post looks at the University of Cambridge repository  ‘Request a copy’ service from the user’s perspective in terms of uptake so far, feedback we have received, and reasons why people might request a copy of a document in our repository. You may be interested in the related blog post on our ‘Request a copy’ service, which discusses the concept behind ‘Request a copy’, the process by which files are requested, and how this has been implemented at Cambridge

Usage Statistics

The Request a Copy button has been much more successful than we anticipated, particularly because there is no actual ‘button’. By the end of September 2016 (four months after the introduction of ‘Request a copy’), we had received 1120 requests (approximately 280 requests per month), the vast majority of which were for articles (68%) and theses (28%). The remaining 4% of requests were for datasets or other types of resource. We are aware that this is a particularly quiet time in the UK academic year, and expect that the number of requests will increase now term has started again.

Of the requests for articles during this period, 38% were fulfilled by the author sending a copy via the repository, and 4% were rejected by clicking the ‘Don’t send a copy’ button. However, these figures could be misleading as a number of authors have also advised us that they have entered into correspondence with the requester to ask them for further information about who they are and why they are interested in this research. Eventually, this correspondence may result in the author emailing a copy of the paper to the requester, but as this happens outside the repository, it does not appear in our fulfilment statistics. Therefore, we suspect the figure for accepted requests is in actual fact slightly higher.

Of the articles requested during this period, 45% were yet to be published, and 55% were published but not yet available to those without a subscription to the journal. The large number of requests made prior to publication indicates the value of having a policy where articles are submitted to the repository on acceptance rather than publication – there is clearly interest in accessing this research among the wider public, and if they are able to make use of it rather than waiting during the sometimes lengthy period between acceptance and publication, this can make the research process more efficient.

Author Survey

To find out why authors might not be fulfilling requests through the repository links, Dr Lauren Cadwallader, one of our Open Access Research Advisors, sent a survey on 6 July 2016 to the 113 authors who had received requests but had not clicked on the repository link or been in touch with repository staff to advise of an alternative course of action. This survey had a 13% response rate, with 15 participants, as well as eight email responses from users who provided feedback but did not complete the survey.

The relatively low response rate is indicative of either a lack of engagement with or awareness of the process – it is possible that the request emails and survey email were dismissed as spam, or that researchers were unable to respond due to an already heavy workload. One way of addressing this could be to include some information about ‘Request a copy’ in our existing training sessions, in particular to emphasise how quick the process can be in cases where the author is happy to approve the request without needing any further information from the user. We have also been developing the wording of the email sent to the author, to explain the purpose of the service more clearly, and to make it sound like a legitimate message that is less likely to be dismissed as spam.

Of the 15 people who participated in the survey, the majority were aware that they had received an email, which shows that lack of response is not always due to emails being lost in spam filters. When asked for the reason why they did not fulfil the request via the repository link, 35% of authors replied that they had emailed the requester directly, either to send the file, to request more information, or to explain why it was not possible for them to share the file at this time. This finding is quite positive, as it indicates that over a third of these requests are indeed being followed up. Although it would be helpful to us to be able to keep track of approvals through the system, at least this means that the service is fulfilling its purpose in providing a way for authors to interact with other interested researchers, and to share their work if appropriate. In fact, one of the aspects that participants liked best about the ‘Request a copy’ service was the ability to communicate directly with the requestor.

Two authors did not respond to the request because the article was available elsewhere on the internet, such as their personal / departmental website, or a preprint server (where the restrictions relating to repositories do not apply), although they did not communicate this to the requestor. In these cases, it is definitely positive that the authors are happy to share their work; however, it does show that there is often an assumption among researchers that people interested in reading their articles will be restricted to those already in their specific disciplinary communities.

Requests from people who are unaware of sites where the research might also be made available demonstrates that there is indeed an appetite among those outside of academia, or from different subject areas. This is generally a really positive thing, as it facilitates the University’s research outputs to educate and inspire a new audience beyond the more traditional communities, and could potentially lead to new collaboration opportunities. To ensure that requestors are able to access the material, and that researchers are not bombarded with requests for documents that are already freely available, authors can provide links to any external websites that are hosting a preprint version of the article, and we will add them to the repository record.

Other responses indicated that we were not necessarily emailing the right person, as participants said that they had not approved the request because they were not the corresponding author, or because they thought a co-author had already responded. At the outset of the service, we felt that emailing as many authors as possible would increase the likelihood of receiving a response; however, the survey results show that it would be better to send requests to the corresponding author(s) only, at least in cases where it is clear who they are.

An issue we have encountered on a semi-regular basis since HEFCE’s Open Access policy came into force is that of making an article’s metadata available prior to its publication. Although HEFCE and funder policies state that an article’s repository record should be discoverable, even if the article itself must be placed under embargo based on publisher restrictions, there is concern among some authors that metadata release breaches the publisher’s press embargo. You can read about this issue in some detail here.

Receiving requests for an article via the ‘Request a copy’ service can be unsettling for authors as it demonstrates how easily the repository record can be accessed, and rather than respond to the request, they contact the Open Access team to ask for the metadata record to be withdrawn until the article is published. This demonstrates a need to communicate more clearly, both on our website and within the ‘Request a copy’ pages in the repository, what is required of authors as part of HEFCE and funder Open Access policies. We will also be more explicit in the ‘Request a copy’ emails sent to authors in stating that sharing their articles via this service will not be seen as a breach of the publisher’s embargo. In cases where the author does not wish to disseminate their article before it is published, they have the option to deny any requests they receive.

Facilitating requests

There have been several instances where press interest around an article at the point of publication has generated a large number of requests, each of which must be responded to individually by the author. This has resulted in several authors asking that we automatically approve every request rather than forwarding them on. Unfortunately this is not possible for us to do, due to the legal issues surrounding ‘Request a copy’.

It is perfectly acceptable for an author to send a copy of their article to an individual, but if a repository makes that article available to everyone who requests it before the embargo has been lifted, this would be a breach of copyright because it would be ‘systematic distribution’. While responding to multiple requests is likely to be seen as an annoyance by an already overstretched researcher, we hope that a large volume of requests will also be viewed in a positive light, as it demonstrates the interest people have in their work.

Use cases

An interesting example of a request we received was actually from one of the authors of the article, as they did not have access to a copy themselves. This raises some questions about communication between the researchers in this case, if the ‘Request a copy’ service was seen to be a better way of gaining access to the author’s own research, rather than contacting one of their co-authors.

A more surprising use case is that of a plaintiff who had lost a legal case. The plaintiff was requesting an as-yet unpublished article that had been written about the case, because the article appears to argue in favour of the plaintiff and could potentially inform a future appeal. This is a good example of how the ‘Request a copy’ service could be of direct benefit in the world outside academia.

Although the vast majority of requests have been for research outputs such as articles, theses and datasets, we also occasionally receive requests for images that belong to collections held in different parts of the University, where high-quality versions are stored in the repository under restricted access conditions. With these requests, it can be more difficult to find who the copyright-holder is, which sometimes requires detective work by the repository team. In one case, permission had to be sought from a photographer who only has a postal address, and therefore required more explanation about the repository more generally, as well as the specific request.

Looking to the future

We will use this research and any further feedback we receive to improve the experience of our ‘Request a copy’ service for both authors and requestors, including implementing the ideas suggested above. Usage statistics will continue to be monitored, and we may run a user survey again to determine how far the service has improved, as well as to identify any new issues.

In the meantime, if you have any comments or questions about our ‘Request a copy’ service, either as an author or a requester (or both), please send us an email to support@repository.cam.ac.uk .

Published 7 October 2016
Written by Sarah Middle
Creative Commons License

Milestone – 10,000th article processed by OA Service

The Open Access Service at Cambridge has received its 10,000th Open Access submission – highlighting its commitment to making research freely available to anybody who wants to access it, without publisher paywalls or expensive journal subscriptions.

Through open access our research can reach a worldwide audience.

Nita Forouhi

The 10,000th submission, reporting on the impact of eating a Mediterranean diet on the risk of developing cardiovascular disease in a UK population, was deposited by Signe Wulund at the MRC Epidemiology Unit, on behalf of Dr Nita Forouhi, Programme Leader in Nutritional Epidemiology at the MRC Epidemiology Unit, and several co-authors.

The Open Access movement has been growing in strength in academia for many years, and it is increasingly being mandated by funding bodies and government.

Dr Forouhi said: “Through open access our research can reach a worldwide audience. It would be a huge pity if interested researchers, practitioners or policy makers could not read about new research, such as our latest findings on the link between the Mediterranean diet and cardiovascular health in a non-Mediterranean setting, because of something as simple as lacking a journal subscription.

“Open access enables wider dissemination of research findings, and in turn, facilitates better research and evidence-based policy and clinical practice.”

The Cambridge Open Access Service was established within the University Library in 2013 in response to Research Councils UK (RCUK) making Open Access mandatory for anyone accepting their funding. Many other major funders, including the Wellcome Trust, Cancer Research UK and the British Heart Foundation, have similar policies.

In 2014, the Higher Education Funding Council for England announced that Open Access would be compulsory for any article included in the next Research Excellence Framework (REF) exercise. This policy came into force on April 1, 2016, effectively meaning that all research in UK institutions now has to be made freely available.

Since its inception in 2013, the Open Access service has processed 10,000 manuscripts, across all University faculties and departments and worked with 3,000 different members of staff. 6,000 of the papers were covered by the HEFCE open access policy; 4,000 acknowledged RCUK funding and 1,900 COAF (many papers fall into multiple categories, and some into none). More than £5.4 million of Open Access grants from funding bodies have also been distributed.

Meeting these requirements is a major task for the University, and one it has tried to make as simple as possible for researchers. Authors are simply required to upload their manuscript to www.openaccess.cam.ac.uk when it’s accepted for publication, and the Open Access team advise them on what they need to do to comply with funder requirements, eligibility for any funding body grants, and handle depositing the article into Apollo, the University’s institutional repository.

Ten thousand manuscripts have now been received in this way, and the vast majority of them have been able to be made Open Access, free for anyone who wants to read and benefit from them.

The 10,000th article was: ‘Prospective association of the Mediterranean diet with cardiovascular disease incidence and mortality and its population impact in a non-Mediterranean population: the EPIC-Norfolk Study’ in BMC Medicine. [DOI:10.1186/s12916-016-0677-4]

The Open Access team at the University of Cambridge is part of the Office of Scholarly Communication (OSC), within the University Library. As well as assisting researchers with Open Access and Open Data compliance, it advises on scholarly communication tools, techniques, policies and practices, and provides training.

This story originally appeared on the University of Cambridge Research news pages.

Published 05 October 2016
Written by Dr Philip Boyes
Creative Commons License

Taking a Principled stance – the Scholarly Commons

It only rains about 10 days a year in San Diego. And Tuesday was one of them. In a rooftop room on campus in San Diego at UCSD, a group had gathered for the FORCE11 Scholarly Commons workshop. The workshop brought together members of the Scholarly Commons working group, who hail from around the world and come from the broad scholarly commons. The Scholarly Commons is an idea to help define the future of research communication. The goal is to promote the best research and scholarship possible through rapid and wide dissemination to all who need or want it.

FORCE stands for the Future of Research Communications and eScholarship and is an organisation (or community) open to anyone interested in these issues. The group consisted of researchers from multiple disciplines, communicators, programmers, and a couple of librarians. This is the unusual and powerful thing about FORCE11 – the diversity of its members. Someone actually remarked: ‘you know, there probably should be a few more librarians here’ which is something you don’t often hear at meetings about open access issues. Usually librarians are delighted if a real live researcher turns up.

We were meeting to discuss the draft of 18 Principles of the Commons – an attempt to define what the community considers the attributes and behaviours of a person who is fully participating in research. The Principles are broadly separated into four major themes of being Open, Equitable, Sustainable and Research & Culture Driven.

FORCE11 works openly and tries to be as accessible as possible so there were full and open notes being collaboratively taken and the Twitter hashtag was #futurecommons.

The workshop was very hands-on, and expertly moderated by Jeroen Bosman and Bianca Kramer who are the power behind the excellent 101 Innovations in Scholarly Communication project. As their ‘wheel’ of tools identifying tools available across the research life stages and through time demonstrates, it is becoming increasingly difficult to navigate the new research space. Indeed that is part of the rationale behind this Scholarly Commons project. It is an attempt to take stock and make sense of what we, the community, want to see in an open and accessible future.

Despite having fewer than 40 people we managed to have multiple activities running concurrently with several ‘unworkshops’. Everything was fed back into the group, and there was a very broad range of discussions, agreements and ideas. To prevent this blog being a tome, I am only going to cover here a couple of areas that were discussed.

Standing on the shoulders of giants

Due to a flight delay I was only able to catch the end of the Sunday evening welcome reception where we were asked to reflect on the 18 draft Principles plastered on the walls and decide which we agreed with and which we did not (or had an issue with). As I scanned through I was struck by the overall similarity they had with Robert Merton’s 1942 publication, The Normative Structure of Science where he proposed that science operated to four ‘norms’, of Universalism, Communalism, Disinterestedness and Organised Skepticism.

Screen Shot 2016-09-22 at 11.53.53I mentioned this in an early discussion and was somewhat chuffed that the group really did take this on board – to the extent that one unworkshop group worked on updating the norms to reflect today’s situation.

As an aside – the challenge with having people coming together from multiple research areas is everyone brings their a priori biases with them. People tend to see the problem through their own lens, so have different ways of approaching the problem. For the group to agree that this perspective was a good one was personally very validating.

Considering outreach as part of the research lifecycle

I the first unworkshop I joined we discussed how research-centred the Principles are – they did not consider the importance of outreach. Given the impenetrable nature of the language in many academic papers, we agreed that making something Open Access facilitates outreach but is not outreach itself.

The discussion moved to idea about what researchers could do to help with outreach – even if they themselves did not want (or were unable) to do it. These are fairly simple including providing supplementary material that is accessible in terms of the descriptive language used (no jargon), potentially providing the information in a different language to English, and ensuring the license under which it is made available is open.

We proposed that the Commons should facilitate outreach and have the outreach in mind even if the researcher themselves does not generate the outreach. There had been a comment earlier in the Equity discussion that noted “Each part of the research cycle is equally valid and none should not be preferenced over the other.” Our discussion concluded that outreach (for the lay public) should be considered to be part of the research process and equally valued.

It should be noted that we are not discussing paper-related activities here. Making the paper open access or tweeting a link to the paper doesn’t count. This is about sharing the information in an understandable manner outside the Academy.

Tool mapping

thumb_IMG_2188_1024The workshop, as mentioned, was very hands-on. By that I mean we did several ‘craft activities’ involving dots, glue, sticky tape and scissors. One of these activities involved ranking various tools for research against the four themes of the Principles, deciding whether they were in alignment with them (green), in opposition with them (red) or in-between (yellow).

Screen Shot 2016-09-22 at 13.12.47We then placed these assessments on the windows under the part of the research lifecycle they related to, and ordered them. The most Principle-friendly tools were up high, and the least down low.

 

thumb_IMG_2191_1024 We then did an activity where we tried to trace the path of our own discipline in terms of the tools our disciplines tended to use. This exercise was an attempt to see if there were any discernable patterns about where some disciplines tend to align or otherwise with the Principles. While the sample size for each discipline was too small to really come to any conclusions, this exercise did open up ideas for a way of disseminating the Principles.

The Principles as an Innovation

This is where another of my disciplinary perspectives comes into play. If we accept that the Principles are themselves an ‘innovation’ – in that they are “an idea, practice, or object that is perceived as new by an individual or other unit of adoption”, then we can look to Everett Rogers Diffusion of Innovations first published in 1962, and now in its 5th edition. You might not have heard of him but you know about his work – Rogers was the person who coined the idea of ‘early adopters’, late adopters’ and ‘laggards’.

Amongst lots of interesting insights about why people adopt new ideas, Rogers came up with five ways to evaluate an innovation which will determine the success or otherwise of its adoption. These are judged as a whole and are interrelated:

  • Relative advantage – the perceived efficiencies gained by the innovation relative to current tools or procedures
  • Compatibility with the pre-existing system
  • Complexity or difficulty to learn (it needs to be easy)
  • Trialability or testability without risking the current system
  • Observed effects.

It is the second point which is the interesting one here – ‘Compatibility with the pre-existing system’. The reason why this is relevant is we are not talking about one system when we discuss scholarship – there are a myriad of systems. There is no ‘one solution’. If we are to try and implement something like the Principles across the academy, we will need to do it along disciplinary lines. (Disclosure – this happens to be the conclusion of my 2008 PhD thesis on the adoption of open access across disciplines).

Disciplinary dissemination

This leads us to the question of audiences for the Principles. Ideally we would have institutions signing up to them, pledging that they will work with their research community to work in this manner. But this is unrealistic currently due to the diverse nature of research institutions. But there might be a way to have funders sign up, because often funding is given within disciplinary restraints. This is doubly the case because funders (in the UK, Australia and the US at least) are increasingly using an ‘Impact narrative’ and the Principles offer a way to practically identify and reward impact behaviour.

And we are not coming from a standing start. We can build on the work done by Jeroen Bosman and Bianca Kramer in their 101 Innovations in Scholarly Communication project. There were over 20,000 responses to their survey of innovation use and this allows a detailed mapping of disciplinary behaviours. If we the further map those findings against an assessment of the research tools being used at a disciplinary level and whether they are aligned with the Principles, we should be able to see which disciplinary areas are already working in the Principled way. It is the funders of these disciplines that we should approach first to try and gain early adoption of the Principles. This  work would become a checklist that can reward people for the behaviours that they are already doing in this space.

A project like this would in turn open up some questions about what we need to do at a disciplinary level to help that community become more aligned with the Principles. These may require a number of approaches – Do they have the tools that work for them or do these need to be developed? Is there a cultural reason why this discipline is not engaging? In answering these questions we come up with the answer to the question: What does a Scholarly Commons researcher look like in this discipline?  Until we have some evidence of where these areas are we are effectively stabbing in the dark.

Making this happen now

In a different unworkshop we talked about how the nature of the Principles themselves went against the idea of being  inclusive because we are potentially creating a binary situation – either you are following the Principles or not.  What we really need to do, we agreed, is not reject people for acting in ways that are not totally in line with the Principles, rather reward behaviour that supports the Principles.

In order to facilitate this, we designed a series of ‘Decision Trees’ to help researchers be as open as they can. This is a recognition that researchers are working within a complex ecosystem. With all the will in the world, if there is not an Open Access journal available to you in your field, you cannot publish in one.

thumb_IMG_2196_1024The easiest part of the research lifecycle was to tackle was publishing, in terms of choosing a publication outlet. The decision tree allows for people who cannot publish in an Open Access journal, nor afford to pay for hybrid (not something I personally recommend anyway) to still be ‘Principled’ by putting a copy of their work in a repository.

thumb_IMG_2197_1024Our discussion about data was more complex. For a start, there is a question about whether the data is digital or not. As we discussed it, our draft tree became incredibly complex so we created two separate flows. The Data 1 decision tree says to someone who has analogue data and no funds to digitise, that as long as they put in some information in their paper about how to contact them for the supporting data, then they have met the spirit of the Principles to the best of their ability.

thumb_IMG_2198_1024While we know the gold standard for data sharing is to have the data (with well defined metadata) available openly in a non proprietary repository with a DOI, for various reasons this is not always possible. We should not sanction a researcher because they are unable to meet that (very high) standard. The Data 2 tree shows that data that is in a repository under embargo without a DOI is discoverable in a way it would not be if it were in a desk drawer – so that is, again, within the spirit of the Principles. We need to consider the ‘close enough’ option as being a valid one, at least in the implementation stage of the Principles.

We agreed that in some areas of the research lifecycle that a list of tools that could help would be of more use than a decision tree. Time restraints meant there are a couple of areas of the lifecycle which still need consideration (and we need to do some decision tree design work!), but generally the group agreed that this was probably quite useful.

Conclusions

When it comes to the Principles themselves, we are still working on it. We did however agree that we thought the Principles were something worth doing, and that they were more or less something we can start working with (and on – they are likely to be dynamic). One suggestion was that we call them Scholarly Commons Principles 1.0 – a reference to this being the first version of possibly many. There are plans for several subgroups to pitch for funding to do some deeper work in some areas. So it is an ongoing project, but a substantial one.

There are some troopers in the Scholarly Communication community. Several people at our workshop had ‘done the double’ – attending  the SciDataCon 2016 conference and associated meetings over eight days in Denver last week and then coming to this event. The gruelling pace was starting to show by the end of the last day of our workshop.

thumb_IMG_2177_1024You know you have been on a very short visit when you fly back with the same in-flight crew as your outward bound journey. One of them even recognised me and commented on how quickly I was returning. So while the trip was an exhausting few days, it was productive and worthwhile. And it was really nice to smell eucalypt trees (rather bizarrely) and do laps in an outdoor pool – things I have not done since moving to the UK.

Published 22 September 2016
Written by Dr Danny Kingsley
Creative Commons License

Cambridge University spend on Open Access 2009-2016

Today is the deadline for those universities in receipt of an RCUK grant to submit their reports on the spend. We have just submitted the Cambridge University 2015-2016 report to the RCUK and have also made it available as a dataset in our repository.

Compliance

Cambridge had an estimated overall compliance rate of 76% with 46% of all RCUK funded papers  available through the gold route and 30% of all RCUK funded papers available through the green route.

The RCUK Open Access Policy indicates that at the end of the fifth transition year of the policy (March 2018) they expect 75% of Open Access papers from the research they fund will be delivered through immediate, unrestricted, on‐line access with maximum opportunities for re‐use (‘gold’). Because Cambridge takes the position that if there is a green option that is compliant we do not pay for gold, our gold compliance number is below this, although our overall compliance level is higher, at 76%.

Compliance caveats

The total number of publications arising from research council funding was estimated by searching Web of Science for papers published by the University of Cambridge in 2015, and then filtered by funding acknowledgements made to the research councils. The number of papers (articles, reviews and proceedings papers) returned in 2015 was 2080. This is almost certainly an underestimate of the total number of publications produced by the University of Cambridge with research council funding. The analysis was performed on 15/09/2016.

Expenditure

The APC spend we have reported is only counting papers submitted to the University of Cambridge Open Access Team between 1 August 2015 and 31 July 2016. The ‘OA grant spent’ numbers provided are the actual spend out of the finance system. The delay between submission of an article, the commitment of the funds and the subsequent publication and payment of the invoice means that we have paid for invoices during the reporting period that were submitted outside the reporting period. This meant reconciliation of the amounts was impossible. This funding discrepancy was given in ‘Non-staff costs’, and represents unallocated APC payments not described in the report (i.e. they were received before or after the reporting period but incurred on the current 2015-16 OA grant).

The breakdown of costs indicates we have spent 4.6% of the year’s allocation on staff costs and 5.1% on systems support. We noted in the report that the staff time paid for out of this allocation also supports the processing of Wellcome Trust APCs for which no support is provided by Wellcome Trust.

Headline numbers

  • In total Cambridge spent £1,288,090 of RCUK funds on APCs
  • 1786 articles identified as being RCUK funded were submitted to the Open Access Service, of which 890 required payment for RCUK*
  • 785 articles have been invoiced and paid
  • The average article cost was ~£2008

Caveats

The average article cost can be established by adding the RCUK fund expenditure to the COAF fund expenditure on co-funded articles (£288,162.28)  which gives a complete expenditure for these 785 articles of £1,576,252.42. The actual average cost is £2007.96.

* The Open Access Service also received many COAF only funded and unfunded papers during this period. The number of articles paid for does not include those made gold OA due to the Springer Compact as this would throw out the average APC value.

Observations

In our report on expenditure for 2014 the average article APC was £1891. This means there has been a 6% increase in Cambridge University’s average spend on an APC since then. It should be noted that of the journals for which we most frequently process APCs, Nature Communication is the second most popular. This journal has an APC of £3,780 including VAT.

Datasets on Cambridge APC spend 2009-2016

Cambridge released the information about its 2014 APC spend for RCUK and COAF in March last year and intended to do a similar report for the spend in 2015, however a recent FOI request has prompted us to simply upload all of our data on APC spend into our repository for complete transparency. The list of datasets now available is below.

1. Report presented to Research Councils UK for article processing charges managed by the University of Cambridge, 2014-2015

2. Report presented to the Charity Open Access Fund for article processing charges managed by the University of Cambridge, 2015-2016

3. Report presented to the Charity Open Access Fund for article processing charges managed by the University of Cambridge, 2014-2015

4. Report presented to Jisc for article processing charges managed by the University of Cambridge, 2014

5. Open access publication data for the management of the Higher Education Funding Council for England, Research Councils UK, Charities Open Access Fund and Wellcome Trust open access policies at the University of Cambridge, 2014-2016

Note: In October 2014 we started using a new system for recording submissions. This has allowed us to obtain more detailed information and allow multiple users to interact with the system. Until December 2015 our financial information was recorded in the spreadsheet below. There is overlap between reports 5. and 6. for the period 24 October and 31 December 2015.  As of January 2016, all data is being collected in the one place.

6. Open access publication data for the management of Research Councils UK, Charities Open Access Fund and Wellcome Trust article processing charges at the Office of Scholarly Communication, 2013-2015

Note: In 2013 the Open Access Service began and took responsibility for the new RCUK fund, and was transferred responsibility for the new Charities Open Access Fund (COAF). At this time the team were recording when an article was fully Wellcome Trust funded, even though the Wellcome Trust funding is a component of COAF.

7. Open access publication data for the management of Wellcome Trust article processing charges from the School of Biological Sciences, 2009-2014

Note: Management of the funds to support open access publishing has changed over the past seven years. Before the RCUK open access policy came into force in 2013, the Wellcome Trust funds were managed by the School of Biological Sciences.

Published 14 September 2016
Written by Dr Danny Kingsley & Dr Arthur Smith
Creative Commons License

Making the connection: research data network workshop

During International Data Week 2016, the Office of Scholarly Communication is celebrating with a series of blog posts about data. The first post was a summary of an event we held in July. This post looks at the challenges associated with financially supporting RDM training.

corpus-main-hallFollowing the success of hosting the Data Dialogue: Barriers to Sharing event  in July we were delighted to welcome the Research Data Management (RDM) community to Cambridge for the second Jisc research data network workshop. The event was held in Corpus Christi College with meals held in the historical dining room. (Image: Corpus Christi )

RDM services in the UK are maturing and efforts are increasingly focused on connecting disparate systems, standardising practices and making platforms more usable for researchers. This is also reflected in the recent Concordat on Research Data which links the existing statements from funders and government, providing a more unified message for researchers.

The practical work of connecting the different systems involved in RDM is being led by the Jisc Research Data Shared Services project which aims to share the cost of developing services across the UK Higher Education sector. As one of the pilot institutions we were keen to see what progress has been made and find out how the first test systems will work. On a personal note it was great to see that the pilot will attempt to address much of the functionality researchers request but that we are currently unable to fully provide, including detailed reporting on research data, links between the repository and other systems, and a more dynamic data display.

Context for these attempts to link, standardise and improve RDM systems was provided in the excellent keynote by Dr Danny Kingsley, head of the Office of Scholarly Communication at Cambridge, reminding us about the broader need to overhaul the reward systems in scholarly communications. Danny drew on the Open Research blogposts published over the summer to highlight some of the key problems in scholarly communications: hyperauthorship, peer review, flawed reward systems, and, most relevantly for data, replication and retraction. Sharing data will alleviate some of these issues but, as Danny pointed out, this will frequently not be possible unless data has been appropriately managed across the research lifecycle. So whilst trying to standardise metadata profiles may seem irrelevant to many researchers it is all part of this wider movement to reform scholarly communication.

Making metadata work

Metadata models will underpin any attempts to connect repositories, preservation systems, Current Research Information Systems (CRIS), and any other systems dealing with research data. Metadata presents a major challenge both in terms of capturing the wide variety of disciplinary models and needs, and in persuading researchers to provide enough metadata to make preservation possible without putting them off sharing their research data. Dom Fripp and Nicky Ferguson are working on developing a core metadata profile for the UK Research Data Discovery Service. They spoke about their work on developing a community-driven metadata standard to address these problems. For those interested (and Git-Hub literate) the project is available here.

They are drawing on national and international standards, such as the Portland Common Data Model, trying to build on existing work to create a standard which will work for the Shared Services model. The proposed standard will have gold, silver and bronze levels of metadata and will attempt to reward researchers for providing more metadata. This is particularly important as the evidence from Dom and Nicky’s discussion with researchers is that many researchers want others to provide lots of metadata but are reluctant to do the same themselves.

We have had some success with researchers filling in voluntary metadata fields for our repository, Apollo, but this seems to depend to a large extent on how aware researchers are of the role of metadata, something which chimes with Dom and Nicky’s findings. Those creating metadata are often unaware of the implications of how they fill in fields, so creating consistency across teams, let alone disciplines and institutions can be a struggle. Any Cambridge researchers who wish to contribute to this metadata standard can sign up to a workshop with Jisc in Cambridge on 3rd October.

Planning for the long-term

A shared metadata standard will assist with connecting systems and reducing researchers’ workload but if replicability, a key problem in scholarly communications, is going to be possible digital preservation of research data needs to be addressed. Jenny Mitcham from the University of York presented the work she has been undertaking alongside colleagues from the University of Hull on using Archivematica for preserving research data and linking it to pre-existing systems (more information can be found on their blog.)

Jenny highlighted the difficulties they encountered getting timely engagement from both internal stakeholders and external contractors, as well as linking multiple systems with different data models, again underlining the need for high quality and interoperable metadata. Despite these difficulties they have made progress on linking these systems and in the process have been able to look into the wide variety of file formats currently in use at York. This has lead to conversations with the National Archive about improving the coverage of research file formats in PRONOM (a registry of file formats for preservation purposes), work which will be extremely useful for the Shared Services pilot.

In many ways the project at York and Hull felt like a precursor to the Shared Services pilot; highlighting both the potential problems in working with a wide range of stakeholders and systems, as well as the massive benefits possible from pooling our collective knowledge and resources to tackle the technical challenges which remain in RDM.

Published 14 September 2016
Written by Rosie Higman
Creative Commons License

Beyond compliance – dialogue on barriers to data sharing

Welcome to International Data Week. The Office of Scholarly Communication is celebrating with a series of blog posts about data, starting with a summary of an event we held in July.

JME_0629.jpgOn 29 July 2016 the Cambridge Research Data Team joined forces with the Science and Engineering South Consortium to organise a one day conference at the Murray Edwards College to gather researchers and practitioners for a discussion about the existing barriers to data sharing. The whole aim of the event was to move beyond compliance with funders’ policies. We hoped that the community was ready to change the focus of data sharing discussions from whether it is worth sharing or not towards more mature discussions about the benefits and limitations of data sharing.

What are the barriers?

So what are the barriers to effective sharing of research data? There were three main barriers identified, all somewhat related to each other: poorly described data, insufficient data discoverability and difficulties with sharing personal/sensitive data. All of these problems arise from the fact that research data does not always shared in accordance to FAIR principles: that data is Findable, Accessible, Interoperable and Re-usable.

Poorly described data

The event started with an inspiring keynote talk from Dr Nicole Janz from the Department of Sociology at the University of Cambridge: “Transparency in Social Science Research & Teaching”. Nicole regularly runs replication workshops at Cambridge, where students select published research papers and they work hard for several weeks to reproduce the published findings. The purpose of these workshop is to allow students to learn by experience on what is important in making their own work transparent and reproducible to others.

Very often students fail to reproduce the results. Frequently, the reasons for failures are insufficient methodology available, or simply the fact that key datasets were not made available. Students learn that in order to make research reproducible, one not only needs to make the raw data files available, but that the data needs to be shared with the source code used to transform it and with written down methodology of the process, ideally in a README file. While doing replication studies, students also learn about the five selfish benefits of good data management and sharing: data disasters are avoided, it is easier to write up papers from well-managed data, transparent approach to sharing makes the work more convincing to reviewers, the continuity of research is possible and researchers can build their reputation for being transparent. As a tip for researchers, Nicole suggested always asking a colleague to try to reproduce the findings before submitting a paper for peer-review.

The problem of insufficient data description/availability was also discussed during the first case study talk by Dr Kai Ruggeri from the Department of Psychology, University of Cambridge. Kai reflected on his work on the assessment of happiness and wellbeing across many European countries, which was part of the ESRC Secondary Data Analysis Initiative. Kai re-iterated that missing data make the analysis complicated and sometimes prevent one from being able to make effective policy recommendations. Kai also stressed that frequently the choice of baseline for data analysis can affect the final results. Therefore, proper description of methodology and approaches taken is key for making research reproducible.

Insufficient data discoverability

JME_0665We also heard several speakers describing problems with data discoverability. Fiona Nielsen founded Repositive – a platform for finding human genomic data. Fiona founded the platform out of frustration that genomic data was so difficult to find and access. Proliferation of data repositories made it very hard for researchers to actually find what they need.

IMG_SearchingForData_20160911Fiona started with doing a quick poll among the audience: how do researchers look for data? It turned out that most researchers find data by doing a literature research or by googling for it. This is not surprising – there is no search engine enabling looking for information simultaneously across the multiple repositories where the data is available. To make it even more complicated, Fiona reported that in 2015 80PB of human genomic data was generated. Unfortunately, only 0.5PB of human genomic data was made available in a data repository.

So how can researchers find the other datasets, which are not made available in public repositories? Repositive is a platform harvesting metadata from several repositories hosting human genomic data and providing a search engine allowing researchers to simultaneously look for datasets shared in all of them. Additionally, researchers who cannot share their research data via a public repository (for example, due to lack of participants’ consent for sharing), can at least create a metadata record about the data – to let others know that the data exist and to provide them with information on data access procedure.

The problem of data discoverability is however not only related to people’s awareness that datasets exists. Sometimes, especially in the case of complex biological data with a vast amount of variables, it can be difficult to find the right information inside the dataset. In an excellent lightening talk, Jullie Sullivan from the University of Cambridge described InterMine –platform to make biological data easily searchable (‘mineable’). Anyone can simply upload their data onto the platform to make it searchable and discoverable. One example of the platform’s use is FlyMine – database where researchers looking for results of experiments conducted on fruit fly can easily find and share information.

Difficulties with sharing personal/sensitive data

The last barrier to sharing that we discussed was related to sharing personal/sensitive research data. This barrier is perhaps the most difficult one to overcome, but here again the conference participants came up with some excellent solutions. First one came from the keynote speech by Louise Corti – with a talk with a very uplifting title: “Personal not painful: Practical and Motivating Experiences in Data Sharing”.

Louise based her talk on the long experience of the UK Data Service with providing managed access to data containing some forms of confidential/restricted information. Apart from being able to host datasets which can be made openly available, the UKDS can also provide two other types of access: safeguarded access, where data requestors need to register before downloading the data, and controlled data, where requests for data are considered on a case by case basis.

At the outset of the research project, researchers discuss their research proposals with the UKDS, including any potential limitations to data sharing. It is at this stage – at the outset of the research project, that the decision is made on the type of access that will be required for the data to be successfully shared. All processes of project management and data handling, such as data anonymisation and collection of informed consent forms from study participants, are then carried in adherence to that decision. The UKDS also offers protocols clarifying what is going to happen with research data once they are deposited with the repository. The use of standard licences for sharing make the governance of data access much more transparent and easy to understand, both from the perspective of data depositors and data re-users.

Louise stressed that transparency and willingness to discuss problems is key for mutual respect and understanding between data producers, data re-users and data curators. Sometimes unnecessary misunderstandings make data sharing difficult, when it does not need to be. Louise mentioned that researchers often confuse ‘sensitive topic’ with ‘sensitive data’ and referred to a success case study where, by working directly with researchers, UKDS managed to share a dataset about sedation at the end of life. The subject of study was sensitive, but because the data was collected and managed with the view of sharing at the end of the project, the dataset itself was not sensitive and was suitable for sharing.

As Louise said “data sharing relies on trust that data curators will treat it ethically and with respect” and open communication is key to build and maintain this trust.

So did it work?

JME_0698The purpose of this event was to engage the community in discussions about the existing limitation to data sharing. Did we succeed? Did we manage to engage the community? Judging by the fact that we have received twenty high quality abstract applications from researchers across various disciplines for only five available case study speaking slots (it was so difficult to shortlist the top five ones!) and also because the venue was full – with around eighty attendees from Cambridge and other institutions, I think that the objective was pretty well met.

Additionally, the panel discussion was led by researchers and involved fifty eight active users on the Sli.do platform for questions to panellists. There were also questions asked outside of Sli.do platform. So overall I feel that the event was a great success and it was truly fantastic to be part of it and to see the degree of participant involvement in data sharing.

Another observation is also the great progress of the research community in Cambridge in the area of sharing: we have successfully moved away from discussions whether research data is worth sharing to how to make data sharing more FAIR.

It seems that our intense advocacy, and the effort of speaking with over 1,800 academics from across the campus since January 2015 paid off and we have indeed managed to build an engaged research data management community.

Read (and see!) more:

Published 12 September 2016
Written by Dr Marta Teperek
Creative Commons License