Something has been rumbling under the surface in the repository world recently, at least in the UK. Over the past six months or so, the Office of Scholarly Communication has had some fraught conversations with researchers who are terrified that their papers will be ‘pulled’ from publication by the journal. The reason is because some information about the upcoming paper is publicly available.
The HEFCE policy asks us to deposit the Author’s Accepted Manuscript into a repository “as soon after the point of acceptance as possible” and to present the manuscript “in a way that allows it to be discovered by readers and by automated tools such as search engine”. So when a researcher deposits a paper we check the publisher copyright restrictions and deposit the work into the repository – shutting down the article, but making the record public. What this means is that the metadata about an article is publicly available before it is published.
Similarly, researchers also share their research data in the repository and make the metadata publicly available. Funders and universities require researchers to include a statement in their publications about the availability of research data supporting their findings. What this means in practice is that researchers deposit research data supporting their manuscripts into data repositories before the corresponding manuscripts are published and more and more frequently, even before these are accepted (to ensure that peer reviewers have access to all supporting materials).
Terminology
Before I go into detail about our challenge, let’s get a few terms straight here. When we talk about ‘data’ in this space we mean the information generated during a research project from which observations and conclusions are made in academic papers. The ‘metadata’ is the information about that data. In the case of an article, the metadata includes the authors, the title, the journal and the abstract. The metadata for datasets is less well defined, although there are data citation principles, so metadata in this case is information about what the data is, how it was generated, how to access and interpret it.
The second term that matters here is ’embargo’ – the word of the moment for me, given my recent participation in the Open Scholarship Initiative Embargo Workgroup. Our resolution was that we need to fund a research piece to resolve the reasoning behind embargoes (bearing in mind there is no link between half life of article usage and subscription rates) and what is a reasonable period of time for them.
Some publishers impose publication embargoes on the release of the Author’s Accepted Manuscript for a period of time after publication. The business of managing embargoes falls into the laps of repository managers. Indeed I have written before about how the complexity of different publisher agreements means it is almost impossible for an individual researcher to navigate the rules. We manage embargoes by putting the article under indefinite embargo when we deposit it and then check back on a rotational basis to see if the work has been published. Once it has, we can set the embargo date. And this is time consuming – to give an idea of scale we currently have over 1700 papers in the ‘checking’ pile.
There is a second type of embargo – a press embargo. This is an embargo which prohibits anyone actively discussing the content of accepted papers with the media prior to publication. The exception is that a few days before publication the journal allows journalists access to the published papers so they can prepare news stories to coincide with the publication of the work. It is this second type of embargo that is causing confusion with the research community.
The perceived problem
Our researchers are concerned that having the metadata about an article available means that publishers will consider this a breach of embargo and will pull the publication. Note that the Author’s Accepted Manuscript of the article itself (or the data files, in case of datasets) is locked down and the information about the volume, issue and pages are missing as the work is not yet published.
The researchers are worried because there is a need for publication in high profile journals such as Nature for their careers and if a work was to be pulled from publication this would have huge implications for them. This has caused a challenge for us – clearly we do not wish to threaten our researchers’ publication prospects, but we are also bound by the requirements of the HEFCE policy.
In November last year I put a query out on a couple of mail lists about this which generated a great deal of discussion. As it happens some UK repository managers are locking down the metadata about ALL articles until publication.
The actual problem
So we have decided to go to the source and are now in discussion with various publishers. A few things have come to light. First, generally publishers understand the distinction between these embargoes (details about responses are below). However there appears to be confusion at the editor level about this – and our researchers are in contact with editors.
The second issue is the level of bullying that researchers appear to be subjected to by the publishing industry. They are petrified of doing the ‘wrong thing’, and that they will be punished by having their article pulled. There is of course a question about whether any articles ever have actually been pulled because of metadata being available prior to publication or if this is yet another ghost we are boxing. No researchers to date have given us a concrete example of this happening.
Below is some of the correspondence we have had with publishers to date on this issue. The responses have been varied – from helpful and encouraging to restrictive and uninformative.
No issue – BMJ
The University of York had some information back from BMJ, which noted that the self archiving policy says that the post-print – “Final draft of manuscript: post peer-review, before the article is copyedited, typeset and published” – can be made available without embargo. They did note that the press (not the author) is subject to the rule “All material accepted for publication in any BMJ journal is under embargo until it is published online.”
Permission and concern – Nature Publishing Group
Late last year I contacted Nature Publishing Group (NPG) to ask their position on this issue and they were adamant that they do not pull papers from publication if the metadata is available prior to publication. They did: “ask that our two embargoes – self-archiving embargo and press embargo – are respected”. They clarified that “NPG deposition to all PMC repositories allows the deposits to be fully discoverable as soon as processed by the repository, and the manuscripts’ full text become accessible 6 months after publication. In practice, this also means metadata can be available upon acceptance“.
NPG also looked at the question of articles being “pulled” ahead of publication and expressed concern that this idea was being propagated. They said that “to our knowledge there are no cases in which this has happened by putting metadata in a university repository. If you have information about any cases in which this is claimed to be the case, we would be very grateful if they could be sent through to us so that we can investigate them further.”
Poor practice – Science
We recently had a Science paper where we had already deposited the data associated with that paper into our repository, (we had shut the data down for release on publication date) and had generated a link that the researchers were able to include in their paper. However the issue of having information in the public domain prior to publication was raised by our researchers so we wrote to Science for clarification. Their response was:
“I discussed your question with my editorial colleagues, and our provisional response is that we would prefer you to take the cautious option of keeping the metadata in the ‘dark archive’ until the date of publication. We appreciate, though, that this is non-ideal from your point of view, and will now be discussing the question with our Office of Public Programs in Washington, to see whether we can accommodate your preferred procedure.”
Remember that this is the publisher telling us to suppress the metadata about data that is published by us in our repository.
The big problem we have (apart from the principle of this issue) is that while we can automate the turning off of an embargo in the repository we cannot automate the movement of an item from a dark archive to an open one – this must be done manually. The paper was published on Good Friday. There was no-one physically at work (indeed the Library was closed) until the following Tuesday. So for the first four days of this article being in the public domain there was a dead link to data in it. This is not just ‘non-ideal’ – it is contrary to the idea of effective and complete publication.
Communication breakdown – The Lancet
An example of the problem we are encountering happened only last week. We were contacted by a researcher who demanded we take down the details of their accepted paper that was to be published in The Lancet because an assistant editor had taken issue with us posting the metadata prior to publication. Again we contacted the publisher and asked them their position.
When we spoke to The Lancet they were helpful and positive, reassuring us they were “happy to permit the release of article metadata at this stage”. They also said they had been in contact with the assistant editor of this article who has directly contacted the researcher. The explanation was “it looks like there was a misunderstanding regarding the wording of one of our policies”.
Right back at ya – Elsevier
Our correspondence with Elsevier was particularly unhelpful. While acknowledging that press embargoes are “editorial policy rather than open access policy” and that they are designed to create press interest, they concluded that this “isn’t a matter for a corporate policy at company level”.
This means it is apparently up to us to find out individual journal positions rather than the organisation taking responsibility for what is becoming a major problem for us. We contacted Cell Press (a subsidiary of Elsevier). They replied:
“You’ll notice that our policy is that we discourage release of the metadata and abstract prior to official publication. The reason is that this would be considered breaking any press embargoes on the article. So while, like Nature, we would not prohibit release of metadata, it means that it would be unlikely that the article would be considered for a press release. If you think that your article is likely to be appropriate for a press release to top-tier media outlets, then we recommend delaying deposit to the institutional repository until the article is officially published.
So not only do Elsevier not give us an answer, and we need to contacts journals directly, we now have to make editorial decisions on individual Cell Press articles to determine if these might be potentially worthy of a press release. Now, as it happens I have a science journalism background* and could possibly do this – but are we seriously required in the UK to employ repository managers with news reporting skills so they can concurrently meet HEFCE requirements and also ensure that the public profile of their institution’s research is protected? (*This means, ironically, that I have many years of direct experience with press embargoes.)
There are two problems here. There is the practical embargo issue repository managers face, and there is the bullying of researchers problem.
Protecting researchers
In the case with The Lancet, despite us explaining that this was a considerably larger problem for us than this particular example, the researcher was unwilling to give us any further details as to the communication they had received from the journal other than it had been an ‘assistant editor’. This ‘protection’ of journals by researchers is not uncommon in our experience. indeed, a recent article noted the “oppression” by editors stating for example that researchers are “afraid to say anything about the New England Journal because they’re afraid they won’t get something published there”.
Bear in mind that Nature specifically stated that they were unaware of any examples of a paper being pulled because the metadata was available prior to publication. We have not been able to uncover a single concrete example of this actually occurring. Yet we continue to have distressed and frightened researchers contacting us because of this threat.
I don’t have an answer to this bullying problem other than our need to move away from having publication in a high impact journal as the be-all and end-all for research careers. In case you are in any doubt of how destructive this situation is, consider ‘Your Right Arm for a Publication in AER?‘ where researchers indicated they would be prepared to lose half a thumb for a high profile publication. But reconfiguring the reward and recognition processes in research is a long way off. Meanwhile the least we can do is acknowledge this is happening and bring it into the open.
Clarity on press embargoes
In relation to the embargo issue, while we can continue to contact each publisher in turn and try and negotiate these issues with them, this is hardly time efficient given it is a sector-wide problem. The ‘solution’ from some (well resourced and staffed) publishers is that it is up to individual repository managers to contact each journal in turn for clarification. This is clearly not possible given that our staff time is already spent checking and complying with the myriad of different publisher requirements in place.
A different solution could be that publishers shorten the time between acceptance and publication. When journals have a quick turnover time, the period when an article’s metadata is publicly available before publication is limited to a matter of days. It is different when there is a delay of months or even years between acceptance and publication.
I should note that our team debated about whether writing this piece would actually do more harm than good – triggering publishers to suddenly introduce further restrictions on metadata. Managing repositories and juggling embargoes and funder policies is already complicated enough. Material available in institutional repositories is only a small percentage of material that is publicly available – according to the Monitoring the Transition to Open Access report, institutional repositories in the UK hold 7.9% of Author Accepted Manuscripts, compared to 56.6% in subject repositories and 19.7% in social sharing sites. Yet we are generally the ones actually adhering to the complex sets of embargo rules.
We consider this to be a sector problem – and one that should be addressed by sector-wide resources, such as the funding bodies. I should note Jisc have expressed interest in supporting us with this issue. The increased crack-down on our activities is frustrating, unhelpful and feeds the beast that is the publishing industry stranglehold on researcher behaviour.