Tag Archives: publishers

Multiplicity, the unofficial theme of Researcher to Reader 2019

4 March 2019UncategorizedLibraries, open access, Plan S, publishers, TrainingOffice of Scholarly Communication

For the past four years at the end of February, publishers, librarians, agents, researchers, technologists and consultants have gathered in London for two days of discussions around the concept of ‘Researcher to Reader’. This blog is my take on what I found the most inspiring, challenging and interesting at the 2019 event. There wasn’t a theme this year per se, but something that did repeatedly arise from where I was standing was the diversity of our perspectives. This is a word that has taken a specific meaning recently, so I am using ‘multiplicity’ instead :

The principles of Plan S are calling for multiple business models for open access publishing, according to Dr Mark Schiltz
There is now great range in the approaches researchers take to the writing process, as described by Dr Christine Tulley
Professor Siva Umpathy described the disparity of standards of living in India which has a profound effect on whether students can engage with research regardless of talent
In order to ensure reproducibility of research, we need multiplicity in the research landscape with larger number of smaller research groups working on a wide array of questions, argued Professor James Evans
Cambridge University Press is trying to break away from the Book/Journal dichotomy, diversifying with a long-form publication called Cambridge Elements
SpringerNature and Elsevier are expanding their business models to encroach into data management and training (although the analogy starts to fall apart here – what this actually represents is a concentration of the market overall).

Anyway, that gives you an idea of the kinds of issues covered. The conference programme is available online and you can read the Twitter conversation from the event (#R2Rconf). Read on for more detail.

The 2019 meeting was, once again, a great programme. (I say that as a member of the Advisory Board, I admit, but it really was).

The Plan S-shaped elephant in the room

Both days began with a bang. The meeting opened with a keynote from Dr Mark Schiltz – President at Science Europe and Secretary General & Executive Head at the Luxembourg National Research Fund – talking about “Plan S and European Research”.

Schiltz explained he felt the current publishing system is a barrier to ensuring the outcomes of research are freely available, noting that hiding results is the antithesis of the essence of science. There was a ‘duty of care’ for funders to invest public funds well to support research. He suggested that there has been little progress in increasing open access to publications since 2009. In terms of the mechanisms of Plan S, he emphasised there are many compliant routes to publication and Plan S “is not about gold OA as the only publication model, it is about principles”. He also noted that there are plans to align Plan S principles with those of OA2020.

As is mentioned in the Plan S principles. Schiltz ended by arguing for the need to revise the incentivisation system in scholarly communications through mechanisms such as DORA. This is the “next big project” for funders, he said.

Catriona McCallum from Hindawi noted DORA is the most vital component for Plan S to work and therefore we need a proper roadmap. She asked if there was a timeline for how funders will make changes to their own systems for evaluating research and grant applications, as this is an area where societies and funders should work together. Schiltz responded that this process is about making concrete changes to practice, not just policy. There is no timeline but there has been more attention on this than ever before. He noted that Dutch universities are meeting next year to redefine tenure/promotion standards which will be interesting to follow. McCallum observed it could take decades if there is no timeline upfront.

One of the early questions from the audience was from a publisher asking why mirror journals were not permitted under Plan S because they are not hybrid journals. Schiltz disagreed, saying if the journals have the same editorial board then it is effectively hybrid because readers will still need to subscribe to the other half, as they would for hybrid. Needless to say, the publisher disagreed.

The question about why Plan S architects didn’t consult with learned societies before going public was not particularly well answered. Schiltz talked about the numbers of hybrid journals being greater than pure subscription journals now and there was concern that hybrid becomes dominant business model. He said we need an actual transition to gold OA, which is all very well but doesn’t actually answer the question. He did note that: “We do not want learned societies to become collateral damage of Plan S”. He acknowledged that many learned societies use surpluses from their publishing businesses to fund good work. But he did ask: “Is the use of thinly spread library budget to subsidise learned societies’ philanthropic activities appropriate, and to what degree? This is not sustainable”.

So, how do researchers approach the writing process?

Professor Christine Tulley, Professor of English at the University of Findlay, Ohio spoke about “How Faculty Write for Publication, Examining the academic lifecycle of faculty research using interview and survey data”. Tulley is involved with training researchers in writing and publishing among other roles. She has published a book called How Writing Faculty Write, Strategies for Process, Product and Productivity based on her research with top researchers who research about writing. She is also collaborating on De Gruyter survey of researchers on writing (with whom she co-facilitated a workshop on this topic, discussed later in this blog).

Tulley’s first observation is that academics think ‘rhetorically’. Regardless of discipline, her findings in the US show that thinking about where you want publish and the community you want to reach is more important to academics than coming up with an idea. Tulley noted that in the past, the process was that academics wrote first then decided where to publish. But this is not the case now, where instead authors consider readership in the first instance, asking themselves what is the best medium to reach that audience. This is a focus on what can be a narrow audience that an author wants to hit – it is not a matter of ‘reach the world’ but can be as few as five important people. This can limit end publication options.

She also observed that after the top two or three journals, then their rank matters less. Because of this, newer journals/ open access publications can attract readers and submissions, particularly through early release, which is more important that ‘official publication’ she observed. This does talk to the recent increase in general interest in preprints.

In a statement that set the hearts of the librarians in the audience aflutter, Tulley spoke about librarians as “tip-off providers”, being especially useful for early online release of research before the indexing kicks in. She noted that academics view librarians as scholarly research ‘Partners’ rather than ‘Support’. We have also had this discussion within the UK library community.

Equity of access to education

It is always really interesting to hear perspectives from elsewhere – be that across the library/researcher/publisher divides, or across global ones. Two talks at the event were very interesting as they described the situation in India and Bangladesh, highlighting how some issues are shared worldwide and others are truly unique.

Prof Siva Umpathy, Director of the Indian Institute of Science Education, Bhopal, spoke first, emphasising that he was giving his personal opinion, not that of the Indian government. He noted that taxpayers pay for higher education in India and this is the case for most of the global south – fees to students are much less common. This means education is seen as a social responsibility of government.

Umpathy noted that 40% of the population in India is currently under 35 years old. infrastructure and opportunities vary significantly within India let alone across the whole ‘global south’. In some areas of India, the standard of living is equivalent to London. In other areas there is no internet connection. This affects who can engage with research, some very bright students from small villages are at a disadvantage. Even the kind of information that might be available to students in India about where to study and how to apply can be uneven affecting ambitions regardless of how talented the student might be. He described the incredibly competitive process to gain a place in a university, consisting of applications, exams and interviews.

In India, when someone is paying to publish a paper it gives an impression that the work is not as high a quality, after all, if you have good science you shouldn’t have to pay for publication. I should note this is not unique to India – witness an article that was published in The Times Literary Supplement the day after this talk that entirely confuses what open access monograph publishing is about (“Vain publishing – The restrictions of ‘open access’”).

Beyond impressions there are practical issues – bureaucrats don’t understand why an academic would pay for open access publication, why they wouldn’t publish in the ‘best’ mainstream journals, therefore funding in India does not allow for any payment for publishing. This is despite India being a big consumer of open access research. This has practical implications. If India were to join Plan S and mandated OA, it will likely reduce the number of papers he is able to publish by half, because there’s no government funding available to cover APCs.

He called for the need to train and editors and peer reviewers and the importance of educating governments, funders and evaluators and suggested that peer-reviewers are given APC discounts to encourage them to review more for journals. This, of course is an issue in the Global North too. Indeed when we ran some workshops on Peer Review late last year. They were doubly subscribed immediately.

Global reading, local publishing – Bangladesh

Dr Haseeb Irfanullah, a self described ‘research communications enthusiast’ spoke about what Bangladesh can tell us about research communications. He began by noting how access to scientific publications has been improved by the Research 4 Life Partnership and INASP. These innovations for increasing access to research literature to global south over past few years have been a ‘revolution’. He also discussed how the Bangladesh Journals Online project has helped get Bangladeshi journals online, including his journal, Bangladesh Journal of Plant Taxonomy. This helps journals get journal impact factors (JIF).

However, Bangladesh journal publishing is relatively isolated, and is ‘self sustaining’. Locally sourced content fulfils the need. Because promotion, increments and recognition needs are met with the current situation (universities don’t require indexed journals for promotion), then this means there is little incentive to change or improve the process. This seems to be example of how a local journal culture can thrive when researchers are subject to different incentives, although perversely the downside is that they & their research are isolated from international research. A Twitter observation about the JIF was “damned if you do or damned if you don’t”.

He also noted that it is ‘very cheap to publish a journal as everyone is a volunteer’, prompting one person on Twitter to ask: “Is it just me or is this the #elephantintheroom we need to address globally?” Irfanullah has been involved in providing training for editors, workshops and dialogues on standards, mentorship to help researchers get their work published, as well as improving access to research in Bangladesh. He concluded that these challenges can be addressed; for example, through dialogue with policymakers and a national system for standards.

Big is not best when it comes to reproducibility

Professor James Evans, from the Department of Sociology at Chicago University (who was a guest of Researcher to Reader in 2016) spoke on why centralised “big science” communities are more likely to generate non-replicable results by describing the differences between small and large teams. His talk was a whirlwind of slides (often containing a dizzy array of graphics) at breath-taking speed.

The research Evans and his team undertake looks at large numbers of papers to determine patterns that identify replicability and whether the increase in the size of research teams and the rise of meta research has any impact. For those interested, published papers include “Centralized “big science” communities more likely generate non-replicable results” and “Large Teams Have Developed Science and Technology; Small Teams Have Disrupted It”.

Evans described some of the consequences when a single mistake is reused and appears in multiple subsequent papers, ‘contaminating’ them. He used an example of the HeLa cell* in relation to drug gene interactions. Misidentified cells resulted in ‘indirect contamination’ of the 32,755 articles based on them, plus the estimated half a million other papers which cited these cells. This can represent a huge cost where millions of dollars’ worth of research has been contaminated by a mistake.

The problem is scientific communities use the same techniques and methods, which reduces the robust nature of research. Increasingly overlapping research networks with exposure to similar methodologies and prior knowledge – research claims are not being independently replicated. Claims that are highly centralised on star scientists, repeat collaborations & overlapping methods are far less robust and lead to huge distortion in the literature. the larger the team, the more likely their output will support and amplify rather than disrupt prior work. if there is an overlap, e.g. between authors or methodologies, there is more likely to be agreement.

Making the analogy of the difference between Slumdog Millionaire vs Marvel movies, Evans noted that independent, decentralised, non-overlapping claims are far more likely to be robust, replicable & of more benefit to society. It is effectively a form of triangulation. Smaller, decentralised communities are more likely to conduct independent experiments to reproduce results, producing more robust results. Small teams reach further into the past and looks to more obscure and independent work. Bigger is not better – smaller teams are more productive, innovative & disruptive because they have more to gain & less to lose than larger teams.

Large overlapping teams increase agglomeration around the same topics. The research landscape is seeing a decrease in small teams, and therefore a decrease in independence. These types of group receive less funding & are ‘more risky’ because they are not part of the centralised network.

Evans described a disruption to the scientific narrative building on what has incrementally happened before is effectively Thomas Kuhn’s The Structure of Scientific Revolutions from the 1960s. But “disruption delays impact” – there is a tendency of research teams to keep building on previous successes (which come with an existing audience) rather than risking disruption and consequent need for new audiences etc. In addition, the size of the team matters, one of their findings has been that each additional person on a team reduces the likelihood of research being disruptive. But disruption requires different funding models -with a taste for risk.

Evans noted that you need small teams simultaneously climbing different hills to find the best solution, rather than everyone trying to climb the same hill. This analogy was picked up by Catriona MacCallum who noted that publishers are actually all on the big hill which means they are in the same boat and trying to achieve the same end goal (hence the mess we are now in). So how do publishers move across to the disruptive landscape with lots of higher hills?

*The HeLa cell is an immortal cell line used in scientific research. It is the oldest and most commonly used human cell line. It is called HeLa because it came from a woman called Henrietta Lacks.

Sci Hub – harm or good?

The second day opened with a debate about Sci Hub on the question of “Is Sci-Hub is doing more good than harm to scholarly communication?”.

The audience was asked to vote whether they ‘agreed’ or ‘disagreed’ with the statement. In this first vote 60% of the audience disagreed and 40% agreed. Note this could possibly reflect attendance at the conference of publishers as the largest cohort of 51% of the attendees, or alternatively be a reflection of the slightly problematic wording of the question. More than one person observed on Twitter that they would have appreciated a ‘don’t know’ or ‘neither good nor bad’ options.

The debate itself was held between Dr Daniel Himmelstein, Postdoctoral Fellow at the University of Pennsylvania (in the affirmative – that SciHub is doing good) and Justin Spence, Partner and Co-Founder at Publisher Solutions International (in the negative – that SciHub is doing harm). I have it on good authority the debate will be written up separately, so won’t do so here. One observation I noted was – the question did not define to whom or what the ‘harm’ was being done. The argument against appeared focused on harm to the market but the argument for was discussing benefit to society.

The discussion was opened up to the room but the comment that elicited a clap from the audience was from Jennifer Smith at St George’s University in London who asked if Elsevier’s profits are defensible when there are people on fun runs raising money for charities who are not anticipating their fundraising cash is going to publisher shareholders rather than supporting research. The question she asked is: “who is stealing from whom?”.

At the end of the debate the audience was asked to vote again at which point, 55% disagreed and 45% agreed meaning Himmelstein won over 5% of the audience. This seems surprising given that it seems very rare to actually change anyone’s mind.

But is it a book or a journal?

Nisha Doshi spoke about Cambridge Elements – a publication format that straddles the Book and Journal formats. It was interesting to hear about some of the challenges Cambridge University Press has faced. These ranged from practical in terms of which systems to use for production which seem to be very clearly delineated as either journal system or book systems. CUP is using several book systems, plus ISBNs, but also using ScholarOne for peer review for this project. Other issues have been philosophical. Authors and many others continue to ask “is it a journal or a book?”. CUP have encouraged authors to embed audio and video in their Cambridge Elements, but are not seeing much take-up so far which is interesting given the success of Open Book Publishers.

Doshi listed the lessons CUP has learned through the process of trying to get this new publication form off the ground. It was interesting to see how far Cambridge Elements has come. In October 2017 as part of our Open Access Week events, the OSC hosted CUP to talk about what was described at this point as their “hybrid books and journals initiative“.

What’s the time Mr Wolf?

In 2016, Sally Rumsey and I spoke to the library communities at our institutions (Oxford and Cambridge, respectively) with a presentation: “Watch out, it’s behind you: publishers’ tactics and the challenge they pose for librarians”. Our warnings have increasingly been supported with publisher activity in the sector over the past three years. Two presentations at Researcher to Reader were along these lines.

In the first instance, Springer Nature presented on their Data Support Services which are a commercial offering in direct competition to the services offered by Scholarly Communication departments in libraries. I should note here that Elsevier also charge for a similar service through their Mendeley Data platform for institutions.

Representing an even further encroachment, the second presentation by Jean Shipman from Elsevier was about a new initiative which is training librarians to train researchers about data management. The new Elsevier Research Data Management Librarian Academy (RDMLA) has an emphasis on peer to peer teaching. Elsevier developed a needs assessment for RDM training, assessed library competencies, and library education curriculum before developing the RDMLA curriculum for RDM training. Example units include research data culture, marketing the program to administrators, and an overview of tools such as for coding. Elsevier moving into the training/teaching space is not new, they have had the ‘Elsevier Publishing Campus’ and ‘Researcher Academy’ for some time. But those are aimed at the research community. This new initiative is formally stepping directly into the library space.

Empathy mapping as a workshop structure

One of the features of Researcher to Reader is the workshops which are run in several sessions over the two day period. In all there is not much more time available than a traditional 2.5 – 3 hour workshop prior to the main event, but this format means there is more reflection time between sessions and does focus the thinking when you are all together.

I attended a workshop on “Supporting Early-Career Scholarship” asking: How can librarians, technologists and publishers better support early career scholars as they write and publish their work?

Ably facilitated by Bec Evans, Founder at Prolifiko with Dee Watchorn, Product Engagement Manager at De Gruyter and Christine Tulley, the workshop used a process called Empathy Mapping. Participants were given handouts with comments made by early career researchers during interviews about the writing process as part of a research programme by Prolifiko. This helped us map out the experience of ECRs from their perspective rather than guessing and imposing our own biases.

We were asked to come up with a problem – for my group it was “How can we help an ECR disseminate their first paper beyond the publication process?” And we were then asked to find a solution. Our group identified that these people need to understand the narrative of their work that they can then take through blogs, presentations, Twitter and other outlets. Our proposal was to create an online programme that only allowed 5 minutes for recording (in the way Screencastify only allows 10 minutes) an understandable explanation of their research that they can then upload for commentary by peers in a safe space before going public.

And so, to end

It is helpful to have different players together in a room. This is really the only way we can start to understand one another. As an indicator of where we are at, we cannot even agree on a common language for what we do – in a Twitter discussion about how SciHub is meeting an ‘ease of access’ need that has not been met by publishers or libraries, it became clear that while in the library space we talk about the scholarly publishing *ecosystem*, publishers consider libraries to be part of the scholarly publishing *industry*.

One tweet from a publisher was: “Good to hear Christine Tulley talk about why academics write and what it is important to them at #R2RConf . We don’t want to, but publishers too often think generically about authors as they do about content”. While slightly confronting (authors are not only their clients, but also provide the content for *free*, so should perhaps be treated with some respect), it does underline why it is so essential that we get researchers, librarians and publishers into the same room to understand one other better.

All the more reason to attend Researcher to Reader 2020!

Published 4 March 2019
Written by Dr Danny Kingsley

‘No free labor’ – we agree.

26 June 2018UncategorizedAPC, downloads, editing, peer review, publishersOffice of Scholarly Communication

[NOTE: The introductory sentence to this blog was changed on 27 June to provide clarification]

Last week members of the University of California* released a Call to Action to ‘Champion change in journal negotiations’ which references the April 2018 Declaration of Rights and Principles to Transform Scholarly Communication. This states as one of the 18 principles:

“No free labor. Publishers shall provide our Institution with data on peer review and editorial contributions by our authors in support of journals, and such contributions shall be taken into account when determining the cost of our subscriptions or OA fees for our authors.”

Well, this is interesting. At Cambridge we have been trying to look at this specific issue since late last year.

The project

Our goal was to have a better understanding of the interaction between publisher and researcher. The (not very imaginatively named) Data Gathering Project is a project to support the decision making of the Journal Coordination Scheme in relation to subscription to, and use of, academic journal literature across Cambridge.

What we have initially found is that the data is remarkably difficult to put together. Cambridge University does not use bibliometrics as a means of measuring our researchers, so we do not subscribe to SciVal, but we have access to Scopus. But Scopus does not pick up Arts and Humanities publications particularly well, so it will always be a subset of the whole.

Some information that we thought would be helpful simply isn’t. We do have an institutional Altmetric account, so we were able to pull a report from Altmetric of every paper with a Cambridge author held in that database. But Altmetric does not give a publisher view – we would have to extract this using doi prefixes or some other system.

Cambridge uses Symplectic Elements to record publications from which, for very complicated reasons, we are unable to obtain a list of publishers with whom we publish. As part of the subscription we have access to the new analysing product, Dimensions. However, as far as we have managed to see, Dimensions does not break down by publisher (it works at the more granular level of journal), and seems to consider anything that is in the open domain (regardless of licence) to be ‘open access’. So figures generated here come with a heavy caveat.

We are also able to access the COUNTER usage statistics for our journals with the help of the Library eresources team. However these include downloads for backfiles and for open access articles, so the numbers are slightly inflated, making a ‘cost per download’ analysis of value against subscription cost inaccurate.

We know how much we spend on subscriptions (spoiler alert: a lot). We need to take into consideration our offsetting arrangements with some publishers – something we are taking an active look at currently anyway.

Reaching out to the publishing community

So to supplement the aggregated information we have to hand, we have reached out to those publishers our researchers publish with in significant quantities to ask them for the following data on Cambridge authors: Peer Reviewing, Publishing, Citing, Editing, and Downloading.

This is exactly what the University of California is demanding. One of the reasons we need to ask publishers for peer review information is because it is basically hidden work. Aggregating systems like Publons do help a bit, although the Cambridge count of reviewers in the system is only 492 which is only a small percentage of the whole. Publons was bought out by Clarivate Analytics (which was Thompson Reuters before this and ISI before that) a year ago. We did approach Clarivate Analytics for some data about our peer reviewing, but declined to pay the eye watering quoted fee.

What have we received?

Contrary to our assumptions, many of the publishers responded saying that this information is difficult to compile because it is held on different systems and that multiple people would need to be contacted. Sometimes this is because publishers are responsible for the publication of learned society journals so information is not stored centrally. They also fed back that much of the data is not readily available in a digestible format.

Some publishers have responded with data on Cambridge peer reviewers and editors, usage statistics, and citation information. A big thank you to Emerald, SAGE, Wiley, the Royal Society and eLife. We are in active correspondence with Hindawi and PLOS. [STOP PRESS: SpringerNature provided their data 30 minutes after this blog went live, so thanks to them as well].

However, a number of publishers have not responded to our requests and one in particular would like to have a meeting with us before releasing any information.

Findings so far

The brief for the project was to ‘understand how our researchers interact with the literature’. While we wrote the brief ourselves, we have come to realise it is actually very vague. We have tried to gather any data we can to start answering this question.

What the data we have so far is helping us understand is how much is being spent on APCs outside the central management of the Office of Scholarly Communication (OSC). The OSC manages the block grants from the RCUK (now UKRI) and the Charities Open Access Fund, but does not look after payments for open access for research funded by, say the Bill and Melinda Gates Foundation or the NIH. This means that there is a not insignificant amount of extra expenditure on top of that coordinated by the OSC. These amounts are extremely difficult to ascertain as observed in 2014.

We already collect and report on how much the Office of Scholarly Communication has spent on APCs since 2013. However some prepayment deals makes the data difficult to analyse because of the way the information is presented to us. For example, Cambridge began using the Wiley Dashboard in the middle of the year with the first claim against it on 6 July 2016, so information after that date is fuzzy.

The other issue with comparing how much a publisher has received in APCs and how much the OSC has paid (to determine the difference) is dates. We have already talked at length about date problems in this space. But here the issue is publisher provided numbers are based on calendar years. Our reporting years differ – RCUK reports from April to March and COAF from October to September, so pulling this information together is difficult.

Our current approach to understanding the complete expenditure on APCs, apart from analysing the data being provided by (some) publishers, is to establish all of the suppliers to whom the OSC has paid an APC and obtain the supplier number. This list of supplier numbers can then be run against the whole University to identify payments outside the OSC.

This project is far from straightforward. Every dataset we have will require some enhancement. We have published a short sister post on what we have learned so far about organising data for analysis. But we are hoping over the next couple of months to start getting a much clearer idea of what Cambridge is contributing into the system – in terms of papers, peer review and editorial work in addition to our subscriptions and APCs. We need more evidence based decision making for negotiation.

Footnote

* There has been some discussion in listservs about who is behind the Call to Action and the Declaration. Thanks to Jeff MacKie-Mason, University Librarian and Professor, School of Information and Professor of Economics at UC Berkeley, we are happy to clarify:

The Declaration is by the faculty senate’s library committee – University Committee on Library and Scholarly Communication (UCOLASC)
The Call to Action is by the University of California’s Systemwide Library and Scholarly Information Advisory Committee, UCOLASC, and the UC Council of University Librarians, who: “seek to engage the entire UC academic community, and indeed all stakeholders in the scholarly communication enterprise, in this journey of transformation”.

Published 26 June 2018 (amended 27 June 2018)
Written by Dr Danny Kingsley & Katie Hughes

Manuscript detectives – submitted, accepted or published?

27 March 2018Uncategorizedaccepted manuscript, licences, open access, publishers, scholarly communicationArthur Smith

In the blog post “It’s hard getting a date (of publication)”, Maria Angelaki discussed how a seemingly straightforward task may turn into a complicated and time-consuming affair for our Open Access Team. As it turns out, it isn’t the only one. The process of identifying the version of a manuscript (whether it is the submitted, accepted or published version) can also require observation and deduction skills on par with Sherlock Holmes’.

Unfortunately, it is something we need to do all the time. We need to make sure that the manuscript we’re processing isn’t the submitted version, as only published or accepted versions are deposited in Apollo. And we need to differentiate between published and accepted manuscripts, as many publishers – including the biggest players Elsevier, Taylor & Francis, Springer Nature and Wiley – only allow self-archiving of accepted manuscripts in institutional repositories, unless the published version has been made Open Access with a Creative Commons licence.

So it’s kind of important to get that right…

Explaining manuscript versions

Manuscripts (of journal articles, conference papers, book chapters, etc.) come in various shapes and sizes throughout the publication lifecycle. At the onset a manuscript is prepared and submitted for publication in a journal. It then normally goes through one or more rounds of peer-review leading to more or less substantial revisions of the original text, until the editor is satisfied with the revised manuscript and formally accepts it for publication. Following this, the accepted manuscript goes through proofreading, formatting, typesetting and copy-editing by the publisher. The final published version (also called the version of record) is the outcome of this. The whole process is illustrated below.

Identifying published versions

So the published version of a manuscript is the version… that is published? Yes and no, as sometimes manuscripts are published online in their accepted version. What we usually mean by published version is the final version of the manuscript which includes the publisher’s copy-editing, typesetting and copyright statement. It also typically shows citation details such as the DOI, volume and page numbers, and downloadable files will almost invariably be in a PDF format. Below are two snapshots of published articles, with citation details and copyright information zoomed in. On the left is an article from the journal Applied Linguistics published by Oxford University Press and on the right an article from the journal Cell Discovery published by Springer Nature (click to enlarge any of the images).

Published versions are usually obvious to the eye and the easiest to recognise. In a way the published version of a manuscript is a bit like love: you may mistake other things for it but when you find it you just know. In order to decide if we can deposit it in our institutional repository, we need to find out whether the final version was made Open Access with a Creative Commons (CC) licence (or in rarer cases with the publisher’s own licence). This isn’t always straightforward, as we will now see.

Published Open Access with a CC licence?

When an article has been published Open Access with a CC licence, a statement usually appears at the bottom of the article on the journal website. However as we want to deposit a PDF file in the repository, we are concerned with the Open Access statement that is within the PDF document itself. Quite a few articles are said to be Open Access/CC BY on their HTML version but not on the PDF. This is problematic as it means we can’t always assume that we can go ahead with the deposit from the webpage – we need to systematically search the PDF for the Open Access statement. We also need to make sure that the CC licence is clearly mentioned, as it’s sometimes omitted even though it was chosen at the time of paying Open Access charges.

The Open Access statement will appear at various places on the file depending on the publisher and journal, though usually either at the very end of the article or in the footer of the first page as in the following examples from Elsevier (left) and Springer Nature (right).

A common practice among the Open Access team is to search the file for various terms including “creative”, “cc”, “open access”, “license”, “common” and quite often a combination of these. But even this isn’t a foolproof method as the search may retrieve no result despite the search terms appearing within the document. The most common publishers tend to put Open Access statements in consistent places, but others might put them in unusual places such as in a footnote in the middle of a paper. That means we may have to scroll through a whole 30- or 40-page document to find them – quite a time-consuming process.

Identifying accepted versions

The accepted manuscript is the version that has gone through peer-review. The content should be the same as the final published version, but it shouldn’t include any copy-editing, typesetting or copyright marking from the publisher. The file can be either a PDF or a Word document. The most easily recognisable accepted versions are files that are essentially just plain text, without any layout features, as shown below. The majority of accepted manuscripts look like this.

However sometimes accepted manuscripts may at first glance appear to be published versions. This is because authors may be required to use publisher templates at the submission stage of their paper. But whilst looking like published versions, accepted manuscripts will not show the journal/publisher logo, citation details or copyright statement (or they might show incomplete details, e.g. a copyright statement such as © 20xx *publisher name*). Compare the published version (left) and accepted manuscript (right) of the same paper below.

As we can see the accepted manuscript is formatted like the published version, but doesn’t show the journal and publisher logo, the page numbers, issue/volume numbers, DOI or the copyright statement.

So when trying to establish whether a given file is the published or accepted version, looking out for the above is a fairly foolproof method.

Identifying submitted versions

This is where things get rather tricky. Because the difference between an accepted and submitted manuscript lies in the actual content of the paper, it is often impossible to tell them apart based on visual clues. There are usually two ways to find out:

Getting confirmation from the author
Going through a process of finding and comparing the submission date and acceptance date of the paper (if available), mostly relevant in the case of arXiv files

Getting confirmation from the author of the manuscript is obviously the preferable and time-saving option. Unfortunately many researchers mislabel their files when uploading them to the system, describing their accepted/published version file as submitted (the fact that they do so when submitting the paper to us may partly explain this). So rather than relying on file descriptions, having an actual statement from the author that the file is the submitted version is better. Although in an ideal world this would never happen as everyone would know that only accepted and published versions should be sent to us.

A common incarnation of submitted manuscripts we receive is arXiv files. These are files that have been deposited in arXiv, an online repository of pre-prints that is widely used by scientists, especially mathematicians and physicists. An example is shown below.

Clicking on the arXiv reference on the left-hand side of the document (circled) leads to the arXiv record page as shown below.

The ‘comments’ and ‘submission history’ sections may give clues as to whether the file is the submitted or accepted manuscript. In the above example the comments indicate that the manuscript was accepted for publication by the MNRAS journal (Monthly Notices of the Royal Astronomical Society). So this arXiv file is probably the accepted manuscript.

The submission history lists the date(s) on which the file (and possible subsequent versions of it) was/were deposited in arXiv. By comparing these dates with the formal acceptance date of the manuscript which can be found on the journal website (if published), we can infer whether the arXiv file is the submitted or accepted version. If the manuscript hasn’t been published and there is no way of comparing dates, in the absence of any other information, we assume that the arXiv file is the submitted version.

Conclusion

Distinguishing between different manuscript versions is by no means straightforward. The fact that even our experienced Open Access Team may still encounter cases where they are unsure which version they are looking at shows how confusing it can be. The process of comparing dates can be time-consuming itself, as not all publishers show acceptance dates for papers (ring a bell?).

Depositing a published (not OA) version instead of an accepted manuscript may infringe publisher copyright. Depositing a submitted version instead of an accepted manuscript may mean that research that hasn’t been vetted and scrutinised becomes publicly available through our repository and possibly be mistaken as peer-reviewed. When processing a manuscript we need to be sure about what version we are dealing with, and ideally we shouldn’t need to go out of our way to find out.

Published 27 March 2018
Written by Dr Melodie Garnier

Next steps for Text & Data Mining

17 August 2017UncategorizedChemDataExtractor, ContentMine, FutureTDM Project, NaCTeM, PLOS, publishers, TDM, text and data mining, WikimediaOffice of Scholarly Communication

Sometimes the best way to find a solution is to just get the different stakeholders talking to each other – and this what happened at a recent Text and Data Mining symposium held in the Engineering Department at Cambridge.

The attendees were primarily postgraduate students and early career researchers, but senior researchers, administrative staff, librarians and publishers were also represented in the audience.

Background

This symposium grew out of a discussion held earlier this year at Cambridge to consider the issue of TDM and what a TDM library service might look like at Cambridge. The general outcome of that meeting of library staff was that people wanted to know more. Librarians at Cambridge have developed a Text and Data Mining libguide to assist.

So this year the OSC has been doing some work around TDM, including running a workshop at Research Libraries UK annual conference in March. This was a discussion about developing a research library position statement on Text and Data Mining in the UK. The slides from that event are available and we published a blog post about the discussion.

We have also had discussions with different groups about this issue including the Future TDM project which has been looking to increase the amount of TDM happening across Europe. This project is now finishing up. The impression we have around the sector is that ‘everyone wants to know what everyone else is doing’.

Symposium structure

With this general level of understanding of TDM as our base point, we structured the day to provide as much information as possible to the attendees. The Twitter hashtag for the event is #osctdm, and the presentations from the event are online.

The keynote presentation was by Kiera McNeice, from the FutureTDM Project who have an overview of what TDM is, how it can be achieved and what the barriers are. There is a video of her presentation (note there were some audio issues in the beginning of the recording).

The event broke into two parallel sessions after this. The main room was treated to a presentation about Wikimedia from Cambridge’s Wikimedian in Residence, Charles Matthews. Then Alison O’Mara-Eves discussed Managing the ‘information deluge’: How text mining and machine learning are changing systematic review methods. A video of Alison’s presentation is available.

In the breakout room, Dr Ben Outhwaite discussed Marriage, cheese and pirates: Text-mining the Cairo Genizah before Peter Murray Rust spoke about ContentMine: mining the scientific literature.

After lunch, Rosemary Dickin from PLOS talked about Facilitating Test and Data Mining how an open access publisher supports TDM. PhD candidate Callum Court presented ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature. This presentation was filmed.

In the breakout room, a discussion about how librarians support TDM was led by Yvonne Nobis and Georgina Cronin. In addition there was a presentation from John McNaught – the Deputy Director of the National Centre for Text and Data Mining (NaCTeM), who presented Text mining: The view from NaCTeM .

Round table discussion

The day concluded with the group reconvening together for a roundtable (which was filmed) to discuss the broader issue of why there is not more TDM happening in the UK.

We kicked off by asking each of the people who had presented during the event to describe what they saw as the major barrier for TDM. The answers ranged from the issue of recruiting and training staff to the legal challenges and policies needed at institutional level to support TDM and the failure of institutions and government to show leadership on the issue. We then opened up the floor to the discussion.

A librarian described what happens when a publisher cuts off access, including the process the library has to go through with various areas of the University to reinstate access. (Note this was the reason why the RLUK workshop concluded with the refrain: ‘Don’t cut us off!’). There was some surprise in the group that this process was so convoluted.

However, the suggestion that researchers let the library know that they want to do TDM and the library will organise permissions was rejected by the group, on both the grounds that it is impractical for researchers to do this, and that the effort associated with obtaining permission would take too long.

A representative from Taylor and Francis suggested that researchers contact the publishers directly and let them know. Again this was rejected as ‘totally impractical’ because of the assumption this made about the nature of research. Far from being a linear and planned activity, it is iterative and to request access for a period of three months and to then have to go back to extend this permission if the work took an unexpected turn would be impractical, particularly across multiple publishers.

One attendee in her blog about the event noted: “The naivety of the publisher, concerning research methodology, in this instance was actually quite staggering and one hopes that this publisher standpoint isn’t repeated across the board.”

Some researchers described the threats they had received from publishers about downloading material. There was anger about the inherent message that the researcher had done something criminal.

There was also some concern raised that TDM will drive price increases as publishers see ‘extra value’ to be extracted from their resources. This sparked off a discussion about how people will experiment if anything is made digitally available.

During the hour long session the conversation moved from high level problems to workflows. How do we actually do this? As is the way with these types of events, it was really only in the last 10 minutes that the real issues emerged. What was clear was something I have repeatedly observed over the past few years – that the players in this space including librarians, researchers and publishers, have very little idea of how the others work and their needs. I have actually heard people say: ‘If only they understood…’

Perhaps it is time we started having more open conversations?

Next steps

Two things have come out of this event. The first is that people have very much asked for some hands on sessions. We will have to look at how we will deliver this, as it is likely to be quite discipline specific.

The second is there is clearly a very real need for publishers, researches and librarians to get into a room together to discuss the practicalities of how we move forward in TDM. One of the comments on Twitter was that we need to have legal expertise in the room for this discussion. We will start planning this ‘stakeholder’ event after the summer break.

Feedback

The items that people identified as the ‘one most important thing’ they learnt was instructive. The answers reflect how unaware people are of the tools and services available, and of how access to information works. Many of the responses listed specific tools or services they had found out about, others commented on the opportunities for TDM.

There were many comments about publishers, both the bad:

Just how much impact the chilling effect of being cut off by publishers has on researchers
That researchers have received threats from publishers
Very interesting about publishers and ways of working with them to ensure not cut off
Lots can be done but it is being hindered by publishers

and the good:

That PLOS is an open access journal
That there are reasonable publishing companies in the UK
That journals make available big data for meta analysis

Commentary about the event

There has been some online discussion and blog posts on the event:

Georgina Cronin’s blog post on the talk prepared by Georgina and her colleague Yvonne Nobis: How librarians support TDM in the Research Environment.
The libraranerrant wrote Text and Data Mining Symposium
Text + Data Mining: A Next-Gen Library Service? by Moorpheus
Online notes made by attendee Laurence Horton
Peter Murray Rust has written several blog posts on Text and Data Mining and created a poster.

Published 17 August 2017
Written by Dr Danny Kingsley

Whose money is it anyway? Managing offset agreements

30 June 2017UncategorizedCOAF, hybrid, offset agreements, open access, publishers, RCUK, subscriptionsOffice of Scholarly Communication

Sometimes an innocent question can blow up a huge discussion, and this is what happened recently at an RCUK OA Practitioner’s Group meeting when I asked what was appropriate for institutions to do when managing money they receive as refunds from publishers through offsetting arrangements.

When an institution pays for an article processing charge (APC) in a hybrid journal, it is doing so in addition to the existing subscription. This is generally referred to as ‘double dipping’. I have written extensively about the issues with hybrid in the past, but here, I’d like to discuss the management of offset agreements.

Offset agreements are a compensation by a publisher to an institution for the extra money they are putting into the system through payment of APCs. Most large publishers have some sort of offset agreement for institutions in the UK which are negotiated by Jisc, based on the principles for offset agreements. (There is one significant publisher which is an exception because it insists there is no need for an offset agreement because it does not double dip.)

Offset agreements are not equal

While offset agreements are negotiated nationally, there is no obligation for any institution to sign up to them. Cambridge makes the decision to sign up to an offset agreement or not through a standard calculation. If we are spending RCUK and COAF funds on the offset it must show benefit to the funds first. If the numbers demonstrate that by signing up to (and sometimes investing in) the agreement, the funds will be better off at the end of the year then we sign. The fact this agreement may have a broader benefit to the wider University is a secondary consideration. The OSC has a publisher and agreements webpage listing the agreements Cambridge is signed up to.

In a fit of spectacular inefficiency, all offsets work slightly differently. Here’s a run down of different types:

In some instances we have a melding of the costs into one payment and there are no transactions for open access. The Springer Compact is an example of this. At Cambridge we have split the cost of this deal between the subscription spend the previous year with the top up being made by our funds from RCUK and COAF in proportion to the amount we publish between these two funders with Springer.

Other offsets are internal – where the money does not leave the publisher’s system. The Wiley OA Agreement is this type. By signing up we receive a 25% discount on each APC that is managed through their dashboard. We also receive a 50% discount in a given year based on the number of APCs we bought the previous year. This money is calculated at the beginning of the year and the ‘money’ is put into a ‘fund’ held by Wiley. The APC payments for future articles can be made out of this credit. It is is bit like a betting app – you can’t get the money out without some difficulty, you can only ‘reinvest’ it

There is a different kind of internal offset where the calculation is made up front based on how much you spent the previous year on APCs. These manifest as a discount on each APC paid. Taylor and Francis’ offset works this way which is a bit of a hassle because you still have to process each APC regardless of whether you spend $2000 or $200 on it. But again there is no extra money anywhere in this equation because the discount is applied before the invoice is issued.

A different kind of arrangement relates more to fully open access journals. These include a membership where you get a discount on APCs for being a member. Sometimes there is a payment associated with this (BMC for example, which for an upfront membership you can get 15% discount), and others where there is no payment (MDPI – 10% discount for now). Alternatively you can ‘buy’ membership for researchers in exchange for the right to publish for free (PeerJ).

The last type of offset is the most straightforward – where the institution gets a cheque back based on the extra spend on APCs over the subscription. Currently IoP is the only publisher with whom Cambridge has this type of agreement.

Managing offset refunds

When Cambridge received its first IoP cheque in 2015 there were questions about what we could or could not do with it. The Open Access Project Board discussed the issue and decided that the money needed to remain within the context of open access. Suggestions included paying our Platinum membership of arXiv.org with it, because this would be supporting open access.

The minutes from the meeting on 31 March 2015 noted: “Any funds returned from publishers as part of deals to offset the cost of article processing charges should be retained for the payment of open access costs, but ring-fenced from the block grants and kept available for emergency uses under the supervision of the Project Board.” We have since twice used this money to pay for fully open access journal APCs when our block grant funds were low.

Whose money is it anyway?

When the issue of offset refunds and what institutions were doing with it was raised at a recent RCUK OA Practitioners Group meeting it became clear that practices vary considerably from institution to institution. One of the points of discussion was whether it would be appropriate to use this money to support subscriptions. The general (strong) sentiment from RCUK was that this would not be within the spirit, and indeed against the principles, of the RCUK policy.

I subsequently sent a request out to a repository discussion list to ask colleagues across the UK what they were doing with this money. To date there have only been a handful of responses.

In one instance with a medium-sized university the IoP money is placed into a small Library fund that is ring-fenced to pay for Open Access in fully Open Access journals only. This fund has the strategic aim to enable a transition to Open Access by supporting new business models and contributing to initiatives such as Knowledge Unlatched, hosting Open Journal Systems, as well as supporting authors to publish in Open Access venues when they have no other source of funding.

A large research institution responded to say they had a specific account set up into which the money was deposited, noting, as did the other respondents, that the financial arrangements of the University would mean that if it were deposited centrally it would never be seen again. This institution noted they were considering using the funds to offset the subscription to IoP in the upcoming year due to a low uptake of the deal.

Another large research institution said the IoP cheques were being ‘saved’ in the subscriptions budget.

Sussex University

In their recent paper “Bringing together the work of subscription and open access specialists: challenges and changes at the University of Sussex” there is a section on how they are managing the offset money. They note: “It seemed a missed opportunity to simply feed it back into the RCUK block grant, but equally inappropriate to use for journal subscriptions or general Library spending”.

The decision was to support APCs for postgraduate researchers (PGRs) who did not have any other access to money for gold open access, and could only be spent on fully open access journals. They noted that this was a welcome opportunity to be able to offer something tangible and helpful in their advocacy dealings with postgraduate researchers.

Only the start of the conversation

This discussion has raised questions about the decision making process for supporting access to the literature.

Subscriptions are paid for at Cambridge through a fund that is not owned by the Library – the fund consists of contributions from all the Schools plus central funds. Representatives of the Schools, Colleges and library staff sit on the Journal Coordination Scheme committee to decide on subscriptions. However decisions about open access memberships and offsets are made by the Office of Scholarly Communication. Given the increased entanglement of these two routes to access the literature, this situation is one the University is aware needs addressing. The Sussex University paper discusses the processes they went through to merge the two decision making bodies.

This is a rich area for investigation – as we move away from subscription-only spend and into joint decision-making between the subscription team and the Open Access team we need to understand what offsets offer and what they mean for the Library. This discussion is just the beginning.

Published 30 June 2017
Written by Dr Danny Kingsley

An open letter to Blood

24 October 2016UncategorizedCOAF, compliance, open access, policy, publishers, RCUK, Wellcome TrustArthur Smith

The Office of Scholarly Communication routinely advises Cambridge authors about their publishing options, and in the vast majority of cases we can help authors comply with funder mandates. However, there are a few notable journals that offer no compliant open access options for Research Council UK (RCUK) and Charity Open Access Fund (COAF) authors. One of those journals is Blood. We’ve previously called them out on their misleading advice:

The author form for the journal Blood is grossly misleading about RCUK/WT compliance. pic.twitter.com/NWSnbHSIEQ

— Cambridge OpenAccess (@CamOpenAccess) 25 July 2016

Today we are urging Blood to offer their authors either self-archiving rights without cost and a maximum 6 month embargo or immediate open access under a Creative Commons Attribution (CC BY) licence. If Blood does not offer these options we will advise our researchers that they should publish elsewhere so as to remain compliant with their funders’ open access policies.

You can click through and read the open letter in full below:

If you would like to add your name to the list of signatories, please email info@osc.cam.ac.uk

Press embargoes – a threat from the shadows

20 May 2016Uncategorizeddata, embargo, institutional repository, metadata, publication, publishersOffice of Scholarly Communication

Something has been rumbling under the surface in the repository world recently, at least in the UK. Over the past six months or so, the Office of Scholarly Communication has had some fraught conversations with researchers who are terrified that their papers will be ‘pulled’ from publication by the journal. The reason is because some information about the upcoming paper is publicly available.

The HEFCE policy asks us to deposit the Author’s Accepted Manuscript into a repository “as soon after the point of acceptance as possible” and to present the manuscript “in a way that allows it to be discovered by readers and by automated tools such as search engine”. So when a researcher deposits a paper we check the publisher copyright restrictions and deposit the work into the repository – shutting down the article, but making the record public. What this means is that the metadata about an article is publicly available before it is published.

Similarly, researchers also share their research data in the repository and make the metadata publicly available. Funders and universities require researchers to include a statement in their publications about the availability of research data supporting their findings. What this means in practice is that researchers deposit research data supporting their manuscripts into data repositories before the corresponding manuscripts are published and more and more frequently, even before these are accepted (to ensure that peer reviewers have access to all supporting materials).

Terminology

Before I go into detail about our challenge, let’s get a few terms straight here. When we talk about ‘data’ in this space we mean the information generated during a research project from which observations and conclusions are made in academic papers. The ‘metadata’ is the information about that data. In the case of an article, the metadata includes the authors, the title, the journal and the abstract. The metadata for datasets is less well defined, although there are data citation principles, so metadata in this case is information about what the data is, how it was generated, how to access and interpret it.

The second term that matters here is ’embargo’ – the word of the moment for me, given my recent participation in the Open Scholarship Initiative Embargo Workgroup. Our resolution was that we need to fund a research piece to resolve the reasoning behind embargoes (bearing in mind there is no link between half life of article usage and subscription rates) and what is a reasonable period of time for them.

Some publishers impose publication embargoes on the release of the Author’s Accepted Manuscript for a period of time after publication. The business of managing embargoes falls into the laps of repository managers. Indeed I have written before about how the complexity of different publisher agreements means it is almost impossible for an individual researcher to navigate the rules. We manage embargoes by putting the article under indefinite embargo when we deposit it and then check back on a rotational basis to see if the work has been published. Once it has, we can set the embargo date. And this is time consuming – to give an idea of scale we currently have over 1700 papers in the ‘checking’ pile.

There is a second type of embargo – a press embargo. This is an embargo which prohibits anyone actively discussing the content of accepted papers with the media prior to publication. The exception is that a few days before publication the journal allows journalists access to the published papers so they can prepare news stories to coincide with the publication of the work. It is this second type of embargo that is causing confusion with the research community.

The perceived problem

Our researchers are concerned that having the metadata about an article available means that publishers will consider this a breach of embargo and will pull the publication. Note that the Author’s Accepted Manuscript of the article itself (or the data files, in case of datasets) is locked down and the information about the volume, issue and pages are missing as the work is not yet published.

The researchers are worried because there is a need for publication in high profile journals such as Nature for their careers and if a work was to be pulled from publication this would have huge implications for them. This has caused a challenge for us – clearly we do not wish to threaten our researchers’ publication prospects, but we are also bound by the requirements of the HEFCE policy.

In November last year I put a query out on a couple of mail lists about this which generated a great deal of discussion. As it happens some UK repository managers are locking down the metadata about ALL articles until publication.

The actual problem

So we have decided to go to the source and are now in discussion with various publishers. A few things have come to light. First, generally publishers understand the distinction between these embargoes (details about responses are below). However there appears to be confusion at the editor level about this – and our researchers are in contact with editors.

The second issue is the level of bullying that researchers appear to be subjected to by the publishing industry. They are petrified of doing the ‘wrong thing’, and that they will be punished by having their article pulled. There is of course a question about whether any articles ever have actually been pulled because of metadata being available prior to publication or if this is yet another ghost we are boxing. No researchers to date have given us a concrete example of this happening.

Below is some of the correspondence we have had with publishers to date on this issue. The responses have been varied – from helpful and encouraging to restrictive and uninformative.

No issue – BMJ

The University of York had some information back from BMJ, which noted that the self archiving policy says that the post-print – “Final draft of manuscript: post peer-review, before the article is copyedited, typeset and published” – can be made available without embargo. They did note that the press (not the author) is subject to the rule “All material accepted for publication in any BMJ journal is under embargo until it is published online.”

Permission and concern – Nature Publishing Group

Late last year I contacted Nature Publishing Group (NPG) to ask their position on this issue and they were adamant that they do not pull papers from publication if the metadata is available prior to publication. They did: “ask that our two embargoes – self-archiving embargo and press embargo – are respected”. They clarified that “NPG deposition to all PMC repositories allows the deposits to be fully discoverable as soon as processed by the repository, and the manuscripts’ full text become accessible 6 months after publication. In practice, this also means metadata can be available upon acceptance“.

NPG also looked at the question of articles being “pulled” ahead of publication and expressed concern that this idea was being propagated. They said that “to our knowledge there are no cases in which this has happened by putting metadata in a university repository. If you have information about any cases in which this is claimed to be the case, we would be very grateful if they could be sent through to us so that we can investigate them further.”

Poor practice – Science

We recently had a Science paper where we had already deposited the data associated with that paper into our repository, (we had shut the data down for release on publication date) and had generated a link that the researchers were able to include in their paper. However the issue of having information in the public domain prior to publication was raised by our researchers so we wrote to Science for clarification. Their response was:

“I discussed your question with my editorial colleagues, and our provisional response is that we would prefer you to take the cautious option of keeping the metadata in the ‘dark archive’ until the date of publication. We appreciate, though, that this is non-ideal from your point of view, and will now be discussing the question with our Office of Public Programs in Washington, to see whether we can accommodate your preferred procedure.”

Remember that this is the publisher telling us to suppress the metadata about data that is published by us in our repository.

The big problem we have (apart from the principle of this issue) is that while we can automate the turning off of an embargo in the repository we cannot automate the movement of an item from a dark archive to an open one – this must be done manually. The paper was published on Good Friday. There was no-one physically at work (indeed the Library was closed) until the following Tuesday. So for the first four days of this article being in the public domain there was a dead link to data in it. This is not just ‘non-ideal’ – it is contrary to the idea of effective and complete publication.

Communication breakdown – The Lancet

An example of the problem we are encountering happened only last week. We were contacted by a researcher who demanded we take down the details of their accepted paper that was to be published in The Lancet because an assistant editor had taken issue with us posting the metadata prior to publication. Again we contacted the publisher and asked them their position.

When we spoke to The Lancet they were helpful and positive, reassuring us they were “happy to permit the release of article metadata at this stage”. They also said they had been in contact with the assistant editor of this article who has directly contacted the researcher. The explanation was “it looks like there was a misunderstanding regarding the wording of one of our policies”.

Right back at ya – Elsevier

Our correspondence with Elsevier was particularly unhelpful. While acknowledging that press embargoes are “editorial policy rather than open access policy” and that they are designed to create press interest, they concluded that this “isn’t a matter for a corporate policy at company level”.

This means it is apparently up to us to find out individual journal positions rather than the organisation taking responsibility for what is becoming a major problem for us. We contacted Cell Press (a subsidiary of Elsevier). They replied:

“You’ll notice that our policy is that we discourage release of the metadata and abstract prior to official publication. The reason is that this would be considered breaking any press embargoes on the article. So while, like Nature, we would not prohibit release of metadata, it means that it would be unlikely that the article would be considered for a press release. If you think that your article is likely to be appropriate for a press release to top-tier media outlets, then we recommend delaying deposit to the institutional repository until the article is officially published.

So not only do Elsevier not give us an answer, and we need to contacts journals directly, we now have to make editorial decisions on individual Cell Press articles to determine if these might be potentially worthy of a press release. Now, as it happens I have a science journalism background* and could possibly do this – but are we seriously required in the UK to employ repository managers with news reporting skills so they can concurrently meet HEFCE requirements and also ensure that the public profile of their institution’s research is protected? (*This means, ironically, that I have many years of direct experience with press embargoes.)

There are two problems here. There is the practical embargo issue repository managers face, and there is the bullying of researchers problem.

Protecting researchers

In the case with The Lancet, despite us explaining that this was a considerably larger problem for us than this particular example, the researcher was unwilling to give us any further details as to the communication they had received from the journal other than it had been an ‘assistant editor’. This ‘protection’ of journals by researchers is not uncommon in our experience. indeed, a recent article noted the “oppression” by editors stating for example that researchers are “afraid to say anything about the New England Journal because they’re afraid they won’t get something published there”.

Bear in mind that Nature specifically stated that they were unaware of any examples of a paper being pulled because the metadata was available prior to publication. We have not been able to uncover a single concrete example of this actually occurring. Yet we continue to have distressed and frightened researchers contacting us because of this threat.

I don’t have an answer to this bullying problem other than our need to move away from having publication in a high impact journal as the be-all and end-all for research careers. In case you are in any doubt of how destructive this situation is, consider ‘Your Right Arm for a Publication in AER?‘ where researchers indicated they would be prepared to lose half a thumb for a high profile publication. But reconfiguring the reward and recognition processes in research is a long way off. Meanwhile the least we can do is acknowledge this is happening and bring it into the open.

Clarity on press embargoes

In relation to the embargo issue, while we can continue to contact each publisher in turn and try and negotiate these issues with them, this is hardly time efficient given it is a sector-wide problem. The ‘solution’ from some (well resourced and staffed) publishers is that it is up to individual repository managers to contact each journal in turn for clarification. This is clearly not possible given that our staff time is already spent checking and complying with the myriad of different publisher requirements in place.

A different solution could be that publishers shorten the time between acceptance and publication. When journals have a quick turnover time, the period when an article’s metadata is publicly available before publication is limited to a matter of days. It is different when there is a delay of months or even years between acceptance and publication.

I should note that our team debated about whether writing this piece would actually do more harm than good – triggering publishers to suddenly introduce further restrictions on metadata. Managing repositories and juggling embargoes and funder policies is already complicated enough. Material available in institutional repositories is only a small percentage of material that is publicly available – according to the Monitoring the Transition to Open Access report, institutional repositories in the UK hold 7.9% of Author Accepted Manuscripts, compared to 56.6% in subject repositories and 19.7% in social sharing sites. Yet we are generally the ones actually adhering to the complex sets of embargo rules.

We consider this to be a sector problem – and one that should be addressed by sector-wide resources, such as the funding bodies. I should note Jisc have expressed interest in supporting us with this issue. The increased crack-down on our activities is frustrating, unhelpful and feeds the beast that is the publishing industry stranglehold on researcher behaviour.

Published 20 May 2016
Written by Dr Danny Kingsley

Watch this space – the first OSI workshop

24 April 2016Uncategorizedembargoes, open access, policy, publishers, repository, scholarly communicationOffice of Scholarly Communication

It was always an ambitious project – trying to gather 250 high level delegates from all aspects of the scholarly communication process with the goal of better communication and idea sharing between sectors of the ecosystem. The first meeting of the Open Scholarship Initiative (OSI) happened in Fairfax, Virginia last week. Kudos to the National Science Communication Institute for managing the astonishing logistics of an exercise like this – and basically pulling it off.

This was billed as a ‘meeting between global, high-level stakeholders in research’ with a goal to ‘lay the groundwork for creating a global collaborative framework to manage the future of scholarly publishing and everything these practices impact’. The OSI is being supported by UNESCO who have committed to the full 10 year life of the project. As things currently stand, the plan is to repeat the meeting annually for a decade.

Structure of the event

The process began in July last year with emailed invitations from Glenn Hampson, the project director. For those who accepted the invitation, a series of emails from Glenn started with tutorials attached to try and ensure the delegates were prepared and up to speed. The emails gathered momentum with online discussions between participants. Indeed much was made of the (many) hundreds of emails the event had generated.

The overall areas the Open Scholarship Initiative hopes to cover include research funding policies, interdisciplinary collaboration efforts, library budgets, tenure evaluation criteria, global institutional repository efforts, open access plans, peer review practices, postdoc workload, public policy formulation, global research access and participation, information visibility, and others. Before arriving delegates had chosen their workgroup topic from the following list:

Embargos
Evolving open solutions (1)
Evolving open solutions (2)
Information overload & underload
Open impacts
Peer review
Usage dimensions of open
What is publishing? (1)
What is publishing? (2)
Impact factors
Moral dimensions of open
Participation in the current system
Repositories & preservation
What is open?
Who decides?

The 190+ delegates from 180+ institutions, 11 countries and 15 stakeholder groups gathered together at George Mason University (GMU), and after preliminary introductions and welcomes the work began immediately with everyone splitting into their workgroups. We spent the first day and a half working through our topics and preparing a short presentation for feedback on the second afternoon. There was then another working session to finalise the presentations before the live-streamed final presentations on the Friday morning. These presentations are all available in Figshare (thanks to Micah Vandegrift).

The event is trying to address some heady and complex questions and it was clear from the first set of presentations that in some instances it had been difficult to come to a consensus, let alone a plan for action. My group had the relative luxury of a topic that is fairly well defined – embargoes. It might be useful for the next event to focus on specific topics and move from the esoteric to the practical.

In addition the meeting had a team of ‘at large’ people who floated between groups to try and identify themes. Unsurprisingly, the ‘Primacy of Promotion and Tenure’ was a recurring theme throughout many of the presentations. It has been clear for some time that until we can achieve some reform of the promotion and tenure process, many of the ideas and innovations in scholarly communication won’t take hold. I would suggest that the different aspects of the reward/incentive system would be a rich vein to mine at OSI2017.

Closed versus open

In terms of outcomes there was some disquiet beforehand, by people who were not attending, about the workshop effectively being ‘closed’. This was because there was a Chatham House Rule for the workgroups to allow people to speak freely about their own experiences.

There was also some disquiet by those people who were attending about a request that the workgroups remain device-free. This was to try and discourage people checking emails and not participating. However people revert to type – in our group we all used our devices to collaborate on our documents. In the end we didn’t have much of a choice, the incredibly high tech room we were using in the modern GMU library flummoxed us and we were unable to get the projector to work.

That all said, there is every intention to disseminate the findings of the workshops widely and openly. During the feedback and presentations sessions there was considerable Twitter discussion at #OSI2016 – there is a downloadable list of all tweets in figshare – note there were enough to make the conference trend on Twitter at one point. This networked graphic shows the interrelationships across Twitter (thanks to Micah and his colleague). In addition there will be a report published by George Mason University Press incorporating the summary reports from each of the groups.

Team Embargo

Our workgroup, like all of them, represented a wide mix of interest groups. We were:

Ann Riley – President, Association of College and Research Libraries
Audrey McCulloch, Chief Executive, Association of Learned and Professional Societies
Danny Kingsley – Head of Scholarly Communication, Cambridge University
Eric Massant, Senior Director of Government and Industry Affairs, RELX Group
Gail McMillan, Director of Scholarly Communication, Virginia Tech
Glenorchy Campbell, Managing Director, British Medical Journal North America
Gregg Gordon, President, Social Science Research Network
Keith Webster, Dean of Libraries, Carnegie Mellon University
Laura Helmuth, incoming president, National Association of Science Writers
Tony Peatfield, Director of Corporate Affairs, Medical Research Council, Research Councils, UK
Will Schweitzer, Director of Product Development, AAAS/Science

It might be worth noting here that our workgroup was naughty and did not agree beforehand on who would facilitate, so therefore no-one had attended the facilitation pre-workshop webinar. This meant our group was gloriously facilitator and post-it note free – we just got on with it.

Banishing ghosts

We began with some definitions about what embargoes are, noting that press embargoes, publication embargoes and what we called ‘security’ embargoes (like classified documents) all serve different purposes.

Embargoes are not ‘all bad’. In the instance of press embargoes they allow journalists early access to the publication in order for them to be able to investigate and write/present informed pieces in the media. This benefits society because it allows for stronger press coverage. In terms of security embargoes they protect information that is not meant to be in the public domain. However embargoes on Author’s Accepted Manuscripts in repositories are more contentious, with qualified acceptance that these are a transitional mechanism in a shift to full open access.

The causal link of green open access resulting in subscription loss is not yet proven. The September 2013 UK Business, Innovation and Skills Committee Fifth Report: Open Access stated “There is no available evidence base to indicate that short or even zero embargoes cause cancellation of subscriptions”. In 2012 the Committee for Economic Development Digital Connections Council in The Future of Taxpayer-Funded Research: Who Will Control Access to the Results? concluded that “No persuasive evidence exists that greater public access as provided by the NIH policy has substantially harmed subscription-supported STM publishers over the last four years or threatens the sustainability of their journals”.

However there is no argument that traffic on websites for journals that rely on advertising dollars (such as medical journals) suffer when the attention is pulled to another place. This clearly potentially affects advertising revenue which in turn can impact on the financial model of those publication.

During our discussions about the differences between press embargoes and publication embargoes I mentioned some recent experiences in Cambridge. The HEFCE Open Access Policy requires us to collect Author’s Accepted Manuscripts at the time of acceptance and make the metadata about them available, ideally before publication. We respect publishers’ embargoes and keep the document itself locked down until these have passed post-publication. However we have been managing calls from sometimes distressed members of our research community who are worried that making the metadata available prior to publication will result in the paper being ‘pulled’ by the journal. Whether this has ever actually happened I do not know – and indeed would be happy to hear from anyone who has a concrete example so we can start managing reality instead of rumour. The problem in these instances is the researchers are confusing the press embargo with the publication embargo.

And that is what this whole embargo discussion comes down to. Much of the discourse and arguments about embargoes are not evidence based. There is precious little evidence to support the tenet that sits behind embargoes – which is that if publishers allow researchers to make copies of their work available open access then they will lose subscriptions. The lack of evidence does not prevent the possibility it is true however – and that is why we need to settle the situation once and for all. If there is a sustainability issue for journals because of wider green open access then we need to put some longer term management in place and work towards full open access.

It is possible the problem is not repositories, institutional or subject-based. Many authors are making the final version of their published work available in contravention of their Copyright Transfer Agreement in ResearchGate or Academia.edu. It might be that this availability of work is having an impact on researcher’s usage of work on the publishers’ sites. Given that in institutional repositories repository managers make huge efforts to comply with complicated embargoes it is quite possible that repositories are not the problem. Indeed, only a small proportion of work is made available through repositories according to the August 2015 Monitoring the Transition to Open Access report (look at ‘Figure 9. Location of online postings (including illicit postings)’ on page 38). If this is the case, requiring institutions to embargo the Author’s Accepted Manuscripts they hold in their repositories for long periods will not make any difference. They are not the solution.

Our conclusion from our preliminary discussions was that there needs to be some concrete, rigorous research into the rationale behind embargoes to inform publishers, researchers and funders.

Our proposal – research questions

In response to this the Embargo workgroup decided that the most effective solution was to collaborate on an agreed research process that will have the buy-in of all stakeholders. The overarching question that we want to try and answer is ‘What are the impacts of embargoes on scholarly communication?’ with the goal to create an evidence base for informed discussion on embargoes .

In order to answer that question we have broken the big issue into a series of smaller questions:

How are embargoes determined?
How do researchers/students find research articles?
Who needs access?
Impact of embargoes on researchers/students?
Effect of embargoes on other stakeholders?

We decided that if the research found there was a case for publication embargoes then agreement on the metrics that should be used to determine the length of an embargo would be helpful. We are hoping that this research will allow standards to be introduced in the area of embargoes.

Discoverability and the issue of searching behaviour is extremely relevant in this space. Our hypothesis is if people are following publishers’ journal pages to find material then the fact that some of the same information is disbursed amongst lots of repositories means that the publisher arguments that embargoes threaten their finances are weakened. However if people are primarily using centralised search engines such as Google Scholar (which favours open versions of articles over paid ones) then that strengthens the publisher argument that they need embargoes to protect revenue.

The other question is whether access really is an issue for researchers. The March 2015 STM Report looked at the research in this area which indicate that well over 90% of researchers surveyed in separate studies said research papers were easy or fairly easy to access which appears to suggests on the face of it little problem in the way of access (look for the ‘Researchers’ access to journals’ section starting p83). Rather than repeating these surveys indicators for how much embargoes restrict access to researchers could include:

The usage of Request a Copy buttons in repositories
The number of ‘turn-aways’ from publishers platforms
The take-up level of Pay Per View options on publisher sites
The level of usage of ‘Get it Now’ – where the library obtains a copy through interlibrary loan or document delivery and absorbs the cost.

Our proposal – Research structure

The project will begin with a Literature Review and an investigation into the feasibility of running some Case Studies.

Two clear Case Studies could provide direct evidence if the publishers were willing to share what they have learned. In both cases, there has been a move from an embargo period for green OA to removing embargoes completely. In the first instance, Taylor and Francis began a trial in 2011 to allow immediate green OA for their library and information science journals, meaning that authors published in 35 library and information science journals have the right to deposit their Accepted Manuscript into their institutional repository and make it immediately available. Authors who choose to publish in these journals are no longer asked to assign copyright. They now sign a license to publish, which allows Taylor & Francis to publish the Version of Record. Additionally, authors can choose to make their work green open access with no embargoes applied. In 2014 the pilot was extended for ‘at least a further year’.

As part of the pilot, Taylor and Francis say a survey was conducted by Routledge to canvas opinions on the Library & Information Science Author Rights initiative and also investigated author and researcher behaviour and views on author rights policies, embargoes and posting work to repositories. The survey elicited over 500 responses, including: “Having the option to upload their work to a repository directly after publication is very important to these authors: more than 2/3 of respondents rated the ability to upload their work to repositories at 8, 9, or 10 out of 10, with the vast majority saying they feel strongly that authors should have this right”. There are no links to this survey that I have been able to uncover. It would be useful to include this survey in the Literature Review and possibly build on it for other stakeholders.

The second Case Study is Sage that, in 2013, decided to move to an immediate green policy. Both examples would have enough data by now to indicate if these decisions have resulted in subscription cancellations. I have proposed this type of study before, to no end. Hopefully we might now have more traction.

The Literature Review and Case Studies will then inform the development of a Survey of different stakeholders – which may have to be slightly altered depending on the audience being surveyed. This is an ambitious goal – because the intention is to have at least preliminary findings available for discussion at the next OSI in 2017.

There was some lively Twitter discussion in the room about our proposal to do the study. Some were saying that the issue is resolved. I would argue that anyone who is negotiating the embargo landscape at the moment (such as repository managers) would strongly disagree with the position. Others referred to research already done in this space, for example the Publishing and Ecology of European Research (PEER) project. This study does discuss embargoes but approached the question with a position that embargoes are valid. The study we are proposing is asking specifically if there is any evidence base for embargoes.

Next steps

We will be preparing a project brief and our report for the OSI publication over the next couple of weeks.

The biggest issue for the project will be for us to gather funding. We have done a preliminary assessment of the time required to do the work so we could work out a ballpark figure for the fundraising goal. Note that our estimation of the number of workdays required for the project was deemed as ‘ludicrously low’ by a consultant in discussion later.

It was noted by a funder in casual discussions that because publishers have a vested interest in embargoes they should fund research that investigates their validity. Indeed Elsevier have already offered to assist financially for which we are grateful, but for this work to be considered robust and for it to be widely accepted it will need to be funded from a variety of sources. To that end we intend to ‘crowd fund’ the research in batches of $5000. The number of those batches will depend on the level of our underestimation of the time required to undertake the work (!).

In terms of governance, Team Embargo (perhaps we might need a better name…) will be working together as the steering committee to develop the brief, organise funding and choose the research team to do the work. We will need to engage an independent researcher or research group to ensure impartiality.

Wrap up summary of the workshop

There were a few issues relating to the organisation of the workshop. Much was made of the many hundreds of emails that were sent both from the organising group and also amongst the delegates before-hand. This level of preliminary discussion was beneficial but using another tool might help. It was noted that the level of email was potentially the reason why some of the delegates who were invited did not attend.

There was a logistic issue in having 190+ delegates staying in a hotel situated in the middle of a set of highways that was a 30 minute bus ride away from the conference location at George Mason University (also situated in an isolated location). The solution was a series of buses to ferry us each way each day, and to and from the airport. We ate breakfast, lunch and dinner together at the workshop location. This combined with the lack of alcohol because we were at an undergraduate American campus (where the legal drinking age is 21) gave the experience something of a school camp feel. Coming from another planned capital city (Canberra, Australia) I am sure that Washington is a beautiful and interesting place. This was not the visit to find that out.

These minor gripes aside, as is often the case, the opportunity to meet people face to face was fantastic. Because there was a heavy American flavour to the attendees, I have now met in person many of the people I ‘know’ well through virtual exchanges. It was also a very good process to work directly with a group of experienced and knowledgeable people who all contributed to a tangible outcome.

OSI is an ambitious project, with plans for annual meetings over the next decade. It will be interesting to see if we really can achieve change.

Published 24 April 2016
Written by Dr Danny Kingsley

‘It is all a bit of a mess’ – observations from Researcher to Reader conference

18 February 2016Uncategorizedarticles, conference, funders, Libraries, open access, peer review, Policies, publishers, research, Researcher to ReaderOffice of Scholarly Communication

“It is all a bit of a mess. It used to be simple. Now it is complicated.” This was the conclusion of Mark Carden, the coordinator of the Researcher to Reader conference after two days of discussion, debate and workshops about scholarly publication..

The conference bills itself as: ‘The premier forum for discussion of the international scholarly content supply chain – bringing knowledge from the Researcher to the Reader.’ It was unusual because it mixed ‘tribes’ who usually go to separate conferences. Publishers made up 47% of the group, Libraries were next with 17%, Technology 14%, Distributors were 9% and there were a small number of academics and others.

In addition to talks and panel discussions there were workshop groups that used the format of smaller groups that met three times and were asked to come up with proposals. In order to keep this blog to a manageable length it does not include the discussions from the workshops.

The talks were filmed and will be available. There was also a very active Twitter discussion at #R2RConf. This blog is my attempt to summarise the points that emerged from the conference.

Suggestions, ideas and salient points that came up

Journals are dead – the publishing future is the platform
Journals are not dead – but we don’t need issues any more as they are entirely redundant in an online environment
Publishing in a journal benefits the author not the reader
Dissemination is no longer the value added offered by publishers. Anyone can have a blog. The value-add is branding
The drivers for choosing research areas are what has been recently published, not what is needed by society
All research is generated from what was published the year before – and we can prove it
Why don’t we disaggregate the APC model and charge for sections of the service separately?
You need to provide good service to the free users if you want to build a premium product
The most valuable commodity as an editor is your reviewer time
Peer review is inconsistent and systematically biased.
The greater the novelty of the work the greater likelihood it is to have a negative review
Poor academic writing is rewarded

Life After the Death of Science Journals – How the article is the future of scholarly communication

Vitek Tracz, the Chairman of the Science Navigation Group which produces the F1000Research series of publishing platforms was the keynote speaker. He argued that we are coming to the end of journals. One of the issues with journals is that the essence of journals is selection. The referee system is secret – the editors won’t usually tell the author who the referee is because the referee is working for the editor not the author. The main task of peer review is to accept or reject the work – there may be some idea to improve the paper. But that decision is not taken by the referees, but by the editor who has the Impact Factor to consider.

This system allows for information to be published that should not be published – eventually all publications will find somewhere to publish. Even in high level journals many papers cannot be replicated. A survey by PubMed found there was no correlation between impact factor and likelihood of an abstract being looked at on PubMed.

Readers can now get papers they want by themselves and create their own collections that interest them. But authors need journals because IF is so deeply embedded. Placement in a prestigious journal doesn’t increase readership, but it does increase likelihood of getting tenure. So authors need journals, readers don’t.

Vitek noted F1000Research “are not publishers – because we do not own any titles and don’t want to”. Instead they offer tools and services. It is not publishing in the traditional sense because there is no decision to publish or not publish something – that process is completely driven by authors. He predicted this will be the future of science publishing will shift from journals to services (there will be more tools & publishing directly on funder platforms).

In response to a question about impact factor and author motivation change, Vitek said “the only way of stopping impact factors as a thing is to bring the end of journals”. This aligns with the conclusions in a paper I co-authored some years ago. ‘The publishing imperative: the pervasive influence of publication metrics’

Author Behaviours

Vicky Williams, the CEO of research communications company Research Media discussed “Maximising the visibility and impact of research” and talked abut the need to translate complex ideas in research into understandable language.

She noted that the public does want to engage with research. A large percentage of public want to know about research while it is happening. However they see communication about research is poor. There is low trust in science journalism.

Vicki noted the different funding drivers – now funding is very heavily distributed. Research institutions have to look at alternative funding options. Now we have students as consumers – they are mobile and create demand. Traditional content formats are being challenged.

As a result institutions are needing to compete for talent. They need to build relationships with industry – and promotion is a way of achieving that. Most universities have a strong emphasis on outreach and engagement.

This means we need a different language, different tone and a different medium. However academic outputs are written for other academics. Most research is impenetrable for other audiences. This has long been a bugbear of mine (see ‘Express yourself scientists, speaking plainly isn’t beneath you’).

Vicki outlined some steps to showcase research – having a communications plan, network with colleagues, create a lay summary, use visual aids, engage. She argued that this acts as a research CV.

Rick Anderson, the Associate Dean of the University of Utah talked about the Deeply Weird Ecosystem of publishing. Rick noted that publication is deeply weird, with many different players – authors (send papers out), publishers (send out publications), readers (demand subscriptions), libraries (subscribe or cancel). All players send signals out into the school communications ecosystem, when we send signals out we get partial and distorted signals back.

An example is that publishers set prices without knowing the value of the content. The content they control is unique – there are no substitutable products.

He also noted there is a growing provenance of funding with strings. Now funders are imposing conditions on how you want to publish it not just the narrative of the research but the underlying data. In addition the institution you work for might have rules about how to publish in particular ways.

Rick urged authors answer the question ‘what is my main reason for publishing’ – not for writing. In reality it is primarily to have high impact publishing. By choosing to publish in a particular journal an author is casting a vote for their future. ‘Who has power over my future – do they care about where I publish? I should take notice of that’. He said that ‘If publish with Elsevier I turn control over to them, publishing in PLOS turns control over to the world’.

Rick mentioned some journal selection tools. JANE is a system (oriented to biological sciences) where authors can plug in abstract to a search box and it analyses the language and comes up with suggested list of journals. The Committee on Publication Ethics (COPE) member list provides a ‘white list’ of publishers. Journal Guide helps researchers select an appropriate journal for publication.

A tweet noted that “Librarians and researchers are overwhelmed by the range of tools available – we need a curator to help pick out the best”.

Peer review

Alice Ellingham who is Director of Editorial Office Ltd which runs online journal editorial services for publishers and societies discussed ‘Why peer review can never be free (even if your paper is perfect)’. Alice discussed the different processes associated with securing and chasing peer review.

She said the unseen cost of peer review is communication, when they are providing assistance to all participants. She estimated that per submission it takes about 45-50 minutes per paper to manage the peer review.

Editorial Office tasks include looking for scope of a paper, the submission policy, checking ethics, checking declarations like competing interests and funding requests. Then they organise the review, assist the editors to make a decision, do the copy editing and technical editing.

Alice used an animal analogy – the cheetah representing the speed of peer review that authors would like to see, but a tortoise represented what they experience. This was very interesting given the Nature news piece that was published on 10 February “Does it take too long to publish research?”

Will Frass is a Research Executive at Taylor & Francis and discussed the findings of a T&F study “Peer review in 2015 – A global view”. This is a substantial report and I won’t be able to do his talk justice here, there is some information about the report here, and a news report about it here.

One of the comments that struck me was that researchers in the sciences are generally more comfortable with single blind review than in the humanities. Will noted that because there are small niches in STM, double blind often becomes single blind anyway as they all know each other.

A question from the floor was that reviewers spend eight hours on a paper and their time is more important than publishers’. The question was asking what publishers can do to support peer review? While this was not really answered on the floor* it did cause a bit of a flurry on Twitter with a discussion about whether the time spent is indeed five hours or eight hours – quoting different studies.

*As a general observation, given that half of the participants at the conference were publishers, they were very underrepresented in the comment and discussion. This included the numerous times when a query or challenge was put out to the publishers in the room. As someone who works collaboratively and openly, this was somewhat frustrating.

The Sociology of Research

Professor James Evans, who is a sociologist looking at the science of science at the University of Chicago spoke about How research scientists actually behave as individuals and in groups.

His work focuses on the idea of using data from the publication process that tell rich stories into the process of science. James spoke about some recent research results relating to the reading and writing of science including peer reviews and the publication of science, research and rewarding science.

James compared the effect of writing styles to see what is effective in terms of reward (citations). He pitted ‘clarity’ – using few words and sentences, the present tense, and maintaining the message on point against ‘promotion’ – where the author claims novelty, uses superlatives and active words.

The research found writing with clarity is associated with fewer citations and writing in promotional style is associated with greater citations. So redundancy and length of clauses and mixed metaphors end up enhancing a paper’s search ability. This harks back to the conversation about poor academic writing the day before – bad writing is rewarded.

Scientists write to influence reviewers and editors in the process. Scientists strategically understand the class of people who will review their work and know they will be flattered when they see their own research. They use strategic citation practices.

James noted that even though peer review is the gold standard for evaluating the scientific record. In terms of determining the importance or significance of scientific works his research shows peer review is inconsistent and systematically biased. The greater the reviewer distance results in more positive reviews. This is possibly because if a person is reviewing work close to their speciality, they can see all the criticism. The greater the novelty of the work the greater likelihood it is to have a negative review. It is possible to ‘game’ this by driving the peer review panels. James expressed his dislike of the institution of suggesting reviewers. These provide more positive, influential and worse reviews (according to the editors).

Scientists understand the novelty bias so they downplay the new elements to the old elements. James discussed Thomas Kuhn’s concept of the ‘essential tension’ between the classes of ‘career considerations’ – which result in job security, publication, tenure (following the crowd) and ‘fame’ – which results in Nature papers, and hopefully a Nobel Prize.

This is a challenge because the optimal question for science becomes a problem for the optimal question for a scientific career. We are sacrificing pursuing a diffuse range of research areas for hubs of research areas because of the career issue.

The centre of the research cycle is publication rather than the ‘problems in the world’ that need addressing. Publications bear the seeds of discovery and represent how science as a system thinks. Data from the publication process can be used to tune, critique and reimagine that process.

James demonstrated his research that clearly shows that research today is driven by last year’s publications. Literally. The work takes a given paper and extracts the authors, the diseases, the chemicals etc and then uses a ‘random walk’ program. The result ends up predicting 95% of the combinations of authors and diseases and chemicals in the following year.

However scientists think they are getting their ideas, the actual origin is traceable in the literature. This means that research directions are not driven by global or local health needs for example.

Panel: Show me the Money

I sat on this panel discussion about ‘The financial implications of open access for researchers, intermediaries and readers’ which made it challenging to take notes (!) but two things that struck me in the discussions were:

Rick Andersen suggested that when people talk about ‘percentages’ in terms of research budgets they don’t want you to think about the absolute number, noting that 1% of Wellcome Trust research budget is $7 million and 1% of the NIH research budget is $350 million.

Toby Green, the Head of Publishing for the OECD put out a challenge to the publishers in the audience. He noted that airlines have split up the cost of travel into different components (you pay for food or luggage etc, or can choose not to), and suggested that publishers split APCs to pay for different aspects of the service they offer and allow people to choose different elements. The OECD has moved to a Freemium model where that the payment comes from a small number of premium users – that funds the free side.

As – rather depressingly – is common in these kinds of discussions, the general feeling was that open access is all about compliance and is too expensive. While I am on the record as saying that the way the UK is approaching open access is not financially sustainable, I do tire of the ‘open access is code for compliance’ conversation. This is one of the unexpected consequences of the current UK open access policy landscape. I was forced to yet again remind the group that open access is not about compliance, it is about providing public access to publicly funded research so people who are not in well resourced institutions can also see this research.

Research in Institutions

Graham Stone, the Information Resources Manager, University of Huddersfield talked about work he has done on the life cycle of open access for publishers, researchers and libraries. His slides are available.

Graham discussed how to get open access to work to our advantage, saying we need to get it embedded. OAWAL is trying to get librarians who have had nothing to do with OA into OA.

Graham talked the group through the UK Open Access Life Cycle which maps the research lifecycle for librarians and repository managers, research managers, fo authors (who think magic happens) and publishers.

My talk was titled ‘Getting an Octopus into a String Bag’. This discussed the complexity of communicating with the research community across a higher education institution. The slides are available.

The talk discussed the complex policy landscape, the tribal nature of the academic community, the complexity of the structure in Cambridge and then looked at some of the ways we are trying to reach out to our community.

While there was nothing really new from my perspective – it is well known in research management circles that communicating with the research community – as an independent and autonomous group – is challenging. This is of course further complicated by the structure of Cambridge. But in preliminary discussions about the conference, Mark Carden, the conference organiser, assured me that this would be news to the large number of publishers and others who are not in a higher education institution in the audience.

Summary: What does everybody want?

Mark Carden summarised the conference by talking about the different things different stakeholder in the publishing game want.

Researchers/Authors – mostly they want to be left alone to get on with their research. They want to get promoted and get tenure. They don’t want to follow rules.

Readers – want content to be free or cheap (or really expensive as long as something else is paying). Authors (who are readers) do care about the journals being cancelled if it is one they are published in. They want a nice clear easy interface because they are accessing research on different publisher’s webpages. They don’t think about ‘you get what you pay for.’

Institutions – don’t want to be in trouble with the regulators, want to look good in league tables, don’t want to get into arguments with faculty, don’t want to spend any money on this stuff.

Libraries – Hark back to the good old days. They wanted manageable journal subscriptions, wanted free stuff, expensive subscriptions that justified ERM. Now libraries are reaching out for new roles and asking should we be publishers, or taking over the Office of Research, or a repository or managing APCs?

Politicians – want free public access to publicly funded research. They love free stuff to give away (especially other people’s free stuff).

Funders – want to be confusing, want to be bossy or directive. They want to mandate the output medium and mandate copyright rules. They want possibly to become publishers. Mark noted there are some state controlled issues here.

Publishers – “want to give huge piles of cash to their shareholders and want to be evil” (a joke). Want to keep their business model – there is a conservatism in there. They like to be able to pay their staff. Publishers would like to realise their brand value, attract paying subscribers, and go on doing most of the things they do. They want to avoid Freemium. Publishers could be a platform or a mega journal. They should focus on articles and forget about issues and embrace continuous publishing. They need to manage versioning.

Reviewers – apparently want to do less copy editing, but this is a lot of what they do. Reviewers are conflicted. They want openness and anonymity, slick processes and flexibility, fast turnaround and lax timetables. Mark noted that while reviewers want credit or points or money or something, you would need to pay peer reviewers a lot for it to be worthwhile.

Conference organisers – want the debate to continue. They need publishers and suppliers to stay in business.

Published 18 February 2016
Written by Dr Danny Kingsley

What does a researcher do all day?

1 February 2016Uncategorizedacademic, administration, career, College, disciplinary differences, lecturer, Principal Invesitgator, publishers, researcher, teachingOffice of Scholarly Communication

Recently, Paul Jervis-Heath* came to speak to Cambridge Libraries staff about work he had done as part of the Cambridge Libraries user centred design programme during the previous academic year.

This project was trying to establish how Cambridge University administrative services would manage the RCUK block grant provided to the University to support the RCUK Open Access policy. The end goal of the project was to design products and services, so the team of six working on the programme needed to start by trying to understand what academics did and what services they needed.

Information gathering process

During the project the team worked with 56 academics including contextual interviews with 34 academics. Paul noted however that it was also important to see the environments they were working in to ‘get into the headspaces’ of who they were designing for.

To this end the team shadowed 10 academics over a 48-hour period. They followed them through their day, literally sitting next to them. They watched lectures, sat in supervisions and took notes. As researchers did tasks the team asked questions about how they felt about the task – whether it was worth their time for example. The number was small because of the time intensity of this approach, however the process revealed good insights. Paul mentioned that they looked at the workarounds academics have for tasks and were able to determine how academics know what is succeeding and what ought they be doing.

The information gathering phase also included 12 co-design sessions looking at research and publishing tools, where they invited a group of participants to act as a designer. These were one on one co-design sessions. The academics were asked to design the journal they would like to publish in. As part of the process they took notes about how the participants talked about the publishing process.

This process is referred to as ‘bootstrapping’. The project was not pretending to have the full picture of what academic life is like. However the findings are robust enough to form an idea of what academics are doing to then create something and take it back to the participants to be refined based on feedback.

Wearing lots of hats

Academics have lots of roles and they get split both between the University and their College and between their teaching and research roles. Paul noted that being an academic is really three or four jobs – each person needs to decide what they will be very good at. He observed that academics have to discover things that are new to the world as well as all of their other administration and work.

Many of the academics observed had between six and eight, sometimes 10 different roles. Some of these come with a job title, and others are unofficial because the academic wants to be a good supervisor, tutor, or a good colleague. The longer someone is around, the more roles they collect. The team started trying to graph people’s job titles as part of the project but this proved challenging because academia is not like a company where people have a fixed job title. Paul described it as more like a series of badges where an academic gets new things ‘pinned on’.

Academics are both teachers and researchers. Paul noted it is always interesting to see which one the participants mentioned first, their teaching role or their research role.

Teaching

Teaching takes up most of the term time and there is no time for research other than, say, putting together reading lists. For most researchers, about 20 minutes is the time length they have available for anything. This is how they carve up their day.

Everybody teaching at Cambridge is a University Teaching Officer – which has four levels. People start off as a Lecturer, then Senior Lecturer, then Reader, the Professor. There are additional roles like the Head of Department, which typically rotates as a two year position. Then there are people who are Director of Studies both within a department and in the Colleges. Tutors look after the pastoral element of life in the College. And that’s just teaching roles.

Research

The other side of the coin is the research roles. People start as Research Associates where they are hired for a specific research project which means there is nothing to move onto, so the person might have to move to a new university. Postdocs often don’t have anywhere to go they tend to use libraries, coffee shops and working from home. For many people the College is their office.

Gaining a Junior Research Fellowship is an important step because the University is funding the research in some way, however most positions are a fixed length. Having your JRF means they know where they are going to be. The next step is a Senior Research Fellow, then Principal Investigator. In science research happens in groups and the Principal Investigator leads the project.

Many people likened running a research group as running a small company while remaining research active. The Principal Investigator is similar to managing director of a small company. Some of these activities they don’t have any real training for. No-one has told them how to manage expenditure of a research project, or how to interview people. Several people noted that the hardest thing is recruitment not least because often candidates are abroad and interviews happen over Skype and Google Hangout. There is a big element of doubt about who they have employed.

Often collaborations are across time zones so researchers are fitting in calls in the early morning and evening to allow for time zones.

Academic roles in detail

The academic roles tended to fall into the following areas:

College role – Supporting students, Public relations administration, research, consultancy teaching
Personal administration – Travel arrangements, updating diary, updating CV and publication lists
College administration – committee meetings and reading papers, reviewing and interviewing candidates for the college, selecting the admissions.
Supporting students – both academic and pastorally, for example providing information about the college or problems with students not coping with work or taking students to hospital.
Teaching –
- Lectures (including preparation and planning curriculum, getting lecture rooms, sorting out timetables.
- Putting slides and demos and reading list up in the course Moodle.
- Writing the exam papers, preparing materials they will need.
- Final issues like meeting the lab technicians, marking the exams.
Research –
- Applying for grant funding involves obtaining quotes from suppliers and partners to go into applications, creating budgets meeting funders, writing applications, research project management.
- Setting up experiments, and gathering data and analysing results.
- A large amount of writing to tell people about it and published – it doesn’t count unless it is published in a good journal. Lots of work in formatting and editing and the reviewing.
- There is informal work – peer reviews. For journals official peer review is usually predicated by informal peer review – people will review each other’s papers to increase chances of getting accepted.
- Managing research groups – running meetings setting goals, managing expenditure, writing job descriptions, recruitment, approving leave
- Once published all the outreach – including listing the work in Symplectic, seminars, going to conferences and doing speaking engagements. Going to London to be interviewed.
Consultancy – meeting collaborators

Disciplinary differences in research

Disciplines differ immensely from one another but not necessarily in the ways traditionally thought of. Rather than there being a Science versus Humanities divide, a more accurate way of thinking about types of research relate to whether the work is being done in a group or by a solo researcher.

The size of the research group is partly determined by the expense of the equipment. Research such as that done by CERN is very expensive and requires grants. In AHSS there is less of a need for external funding (or possibly less money available funding). Note that Junior or Senior Research Fellows tend to be funded by the University but Principal Investigators are often funded through grants.

The pace of the discipline changes how people publish – in fast disciplines there are shorter units of publication, and slower disciplines have longer ones. Physics is very fast discipline so they upload pre-prints to arXiv.org. For example the role of journals in physics is not as important as biology.

Transparency changes across disciplines as well. For example physics is very open and biology is secretive – even colleagues often don’t know what others are working on. Transparency can be measured by the competitiveness of the discipline. It can affect the discipline of the research groups – some are open, others are secretive.

The structure of research groups

Research groups were a surprise to Paul. Members do not work together like you do on a project team. Research groups manifest as a set of researchers following their own interests but generally working in the same area. The researchers share methods and equipment but otherwise they are doing their own thing.

Some groups are supportive with mentoring but others are really competitive. Sometimes this comes from the research group and other times it comes from the people in the group. This appears to be led by the discipline culture of where they come from. It is worth noting that while anecdotally Cambridge people have more freedom, in Cambridge there is a cultural tendency not to show any weakness.

Day in the life graphics

Paul then took the group through the ‘day in the life’ diagrams created out of the shadowing done in Michaelmas Term 2013 (October to December). The graphics he discussed included:

The vertical axis reflects how happy the academic was over the day. High points tend to coincide with having contact with people and talking about their discipline such as discussions with PhD students, or with a research group. However lecturing is not a high point because there is no two-way communication – all the students sit at the back, the lecturer only gets feedback get at the end.

What causes one of the greatest emotional lows for a researcher is being rejected for a paper. They have often put all of their effort and knowledge into a journal paper. If it is rejected after peer review they are being told they have wasted two years of their life. Paul noted that some reviewing boards are brutal and the feedback given is, frankly, rude.

There is a similar low point if an application for grant funding is unsuccessful – it is similar to a rejection. Grant funding applications are worse than a paper as the researcher has to argue why the work is important and why the funder should fund it. Generally funding bodies are not as brutal but they are awarding funding to competitors – so it is a double blow.

Research and publishing experience map

Paul also talked the group through the Research and Publishing Experience Map. As part of the project the team was looking to see if the University was involved in the publishing process in terms of helping it. However the team found that there is no contact with the University during the process of research and publishing. There was no official checkpoint where academics had to tell the University about what they were doing. While there might be a discussion between the person and their supervisor, it is not recorded anywhere.

The research group will know where articles have been submitted, but the information is not captured anywhere – except in their inbox. But in research groups people move on so even a shared memory is lost. So there is no way to collect data, and no place to archive the administration for researchers. While the Research Office knows about the research grant, what a researcher does with the money is up to them. There are not many official touch points with the University.

The result of this work was a need to artificially engineer a touch point with the academics to ensure that they are able to meet their compliance requirements. The www.openaccess.cam.ac.uk upload system is the result.

* Paul now works for a consulting company Modern Human

Published 1 February 2016
Written by Dr Danny Kingsley