Monthly Archives: October 2015

Half-life is half the story

This week the STM Frankfurt Conference was told that a shift away from gold Open Access towards green would mean some publishers would not be ‘viable’ according to a story in The Bookseller. The argument was that support for green OA in the US and China would mean some publishers will collapse and the community will ‘regret it’.

It is not surprising that the publishing industry is worried about a move away from gold OA policies. They have proved extraordinarily lucrative in the UK with Wiley and Elsevier each pocketing an extra £2 million thanks to the RCUK block grant funds to support the RCUK policy on Open Access.

But let’s get something straight. There is no evidence that permitting researchers to make a copy of their work available in a repository results in journal subscriptions being cancelled. None.

The September 2013 UK Business, Innovation and Skills Committee Fifth Report: Open Access stated “There is no available evidence base to indicate that short or even zero embargoes cause cancellation of subscriptions”. In 2012 the Committee for Economic Development Digital Connections Council in The Future of Taxpayer-Funded Research: Who Will Control Access to the Results? concluded that “No persuasive evidence exists that greater public access as provided by the NIH policy has substantially harmed subscription-supported STM publishers over the last four years or threatens the sustainability of their journals”

I am the first to say that we should address questions about how the scholarly publishing landscape is shifting with systematic data gathering, analysis and discussion. We need to look at trends over time and establish what they mean for the ongoing stability of the scholarly literary corpus. But consistently evoking the ‘green open access equals cancellation so we should have longer embargoes’ argument is not the solution.

Let’s put this myth to bed once and for all.

The half life argument

Publishers have been trying to use the half-life argument for some time to justify extending their embargo periods on the author’s accepted manuscript. Embargoes are how long after publication before the manuscript (the author’s Word or LaTeX document, usually saved as a pdf) can be made available in the author’s institutional or a subject-based repository.

The half life of an article is the time it takes for articles to reach half their total number of downloads.

The argument goes along the lines of ‘if articles have a longer half life then they should be kept under embargo for longer’ because, according to a blog published at the beginning of this year by Alice Meadows Open access at Elsevier 2014 in retrospect and a look at 2015: “If an embargo period falls too far below the period it takes for a journal to recoup its costs, then the journal’s survival will be jeopardized.”

The problem with this argument is that there has been, and continues to be, no evidence that permitting authors to make work available in a repository leads to journal cancellations. It is ironic that the consistent line on this issue from the publishers has been that the half–life argument is helping ‘set evidence-based policy settings of embargo periods’.

The half-life spectre was raised again at this week’s STM meeting by Philip Carpenter, executive vice president of research at Wiley where he noted that only 20% of Wiley journal usage occurred in the first 12 months after publication and referred to a 12 month embargo offering only ‘limited protection’ according to The Bookseller.

Evidence for the green = cancellation argument

The need for longer embargoes – 1

The way the ‘evidence’ for this argument has been presented is telling. There is a particular paragraph in Meadow’s blog that is worth republishing in full:

How long those embargo periods should be before manuscripts become publicly accessible is a key issue. To help set evidence-based policy settings of embargo periods, we have contributed to growing industry data. Findings of a recent usage study demonstrated that there is variation in usage half-lives both within and between disciplines. This finding aligned with a study by the British Academy, which also found variation in half-lives between disciplines – and half-lives longer than those previously suggested.

Despite looking like links to two separate items (which gives the impression of more ‘evidence’), the first two links in the section above to ‘industry data’ and to a ‘recent usage study’ both lead to the SAME November, 25, 2013 study by Phil Davis into journal half life usage that started the whole shebang off. The study looked at the usage patterns of over 2800 journals found that only 3% of the journals had half-lives of 12 months or less. The fewest journals with this short half-life were in the Life Sciences (1%) and the highest in engineering (6%).

While in no way criticising the findings of that study, it should be pointed out that the author clearly states that the study was funded by the Professional & Scholarly Publishing (PSP) division of the Association of American Publishers (AAP). The work has not been peer reviewed or published in the literature.

The British Academy report Open Access Journals in the Humanities and Social Sciences does not appear to be available online any longer.

Now, there is no dispute that there are differences in usage patterns of articles between disciplines. This is a reflection of differing communication norms and behaviours. But there is a huge logic jump to then conclude that therefore we need to increase embargo periods. Peter Suber went into some detail on 11 January 2014 (yes, we have been swinging around on this one for a while now) explaining the logical flaw in the argument. At the time Kevin Smith also noted in a blog “Half-lives, policies and embargoes” that “we should not accept anything that is presented as evidence just because it looks like data; some connection to the topic at hand must be proved”.

The need for longer embargoes – 2

Meadow’s blog went on to say:

There are real-world examples where embargo periods have been set too low and the journal has become unviable. For example, as published in the The Scholarly Kitchen, the Journal of Clinical Investigation lost about 40 percent of its institutional subscriptions after adopting a 0-month embargo period in 1996, so it was forced to return to a subscription model in 2009. Similar patterns have been seen with other journals.

The issue referred to here has nothing to do with the half life of research papers that are being made available open access through a repository. This refers to a journal that went to a GOLD Open Access model in 1996 (publishing open access and relying on non-subscription revenue sources), but eventually decided they needed to impose a subscription again in 2009. Not only is this example entirely unrelated to the embargo issue for green Open Access, it happened six years ago. Note the blog does not link to other ‘similar patterns’. They do not exist.

Green policies mean cancellations

The half-life argument has replaced previous, even less substantial ‘evidence’ provided by the publishing industry in 2012. The study was cited as evidence for the argument that “short embargo periods are likely to lead to significant cancellations” by Wiley in a 2013 blog post Open Access – Keeping it Real and by Springer in an interview published as Open Access – Springer tightens rules on self archiving.

The study was conducted by the Association of Learned and Professional Society Publishers (ALPSP). However the study, which was written up and published online had some major methodological issues. It consisted of a single poorly worded question:

“If the (majority of) content of research journals was freely available within 6 months of publication, would you continue to subscribe? Please give a separate answer for a) Scientific, Technical and Medical journals and b) Humanities, Arts and Social Sciences Journals if your library has holdings in both of these categories.”

An analysis of the study highlighted methodological criticisms. The work was not peer reviewed. But there are deeper questions about the motivation behind the survey. The researcher was the Chair of the ALPSP Research Committee and was on the steering committee for the Publishers Research Coalition, raising questions about her (and the study’s) objectivity. There are several other issues relating to the validity of the researcher.

What is the real problem?

There is no doubt that open access policies are causing disruption to publisher’s funding models. That is hardly surprising and in some cases may well be the intent of the policy. But presenting spurious arguments to try and maintain the status quo is not moving this discussion forward.

The point is we do need evidence. If green OA is causing cancellations then let’s collect some numbers and talk about the issues:

  • How does this affect the scholarly communication system?
  • What are the implications?
  • Does this mean publishers will fold (unlikely in the short term)?
  • Will some journals close (possibly)?
  • Is that a problem?
  • Perhaps we need to consider issues relating to the reward system and what is valued?

But I will give the last word to the person who caused me to write this blog in the first place – Philip Carpenter, executive vice-president of research at Wiley who, according to The Bookseller said at the STM meeting: “We’ll need to think hard about what factors influence library purchasing decisions; we don’t know enough [about that]”.

Hear, hear.

Published 16 October 2015
Written by Dr Danny Kingsley
Creative Commons License

Openness, integrity & supporting researchers

Universities need to open research to ensure academic integrity and adjust to support modern collaboration and scholarship tools, and begin rewarding people who have engaged in certain types of process rather than relying on traditional assessment schemes. This was the focus of Emeritus Professor Tom Cochrane’s* talk on ‘Open scholarship and links to academic integrity, reward & recognition’  given at Cambridge University on 7 October.

The slides from the presentation are available here: PRE_Cochrane_DisruptingDisincentives_V1_20151007

Benefits of an open access mandate

Tom began with a discussion about aspects of access to research and research data and why it should be as open as possible. Queensland University of Technology introduced an open access mandate 12 years or so ago. They have been able to observe a number of effects on bibliometric citation rates, such as the way authors show up in Scopus.

The other is the way consulting opportunities arise because someone’s research is exposed to reading audiences that do not have access to the toll-gated literature. Another benefit is the recruiting of HDR students.

Tom outlined six areas of advantage for institutions with a mandate – researcher identity and exposure, advantage to the institution. He noted that they can’t argue causation but can argue correlation, with the university’s. improvement in research performance. Many institutions have been able to get some advantage of having an institutional repository that reflects the output of the institution.

However in terms of public policy, the funders have moved the game on anyway. This started with private funders like Wellcome Trust, but also the public funding research councils. This is the government taxpayer argument, which is happening in the US.

Tom noted that when he began working on open access policy he had excluded books because there are challenges with open access when there is a return to the author, but there has been a problem long term with publishing in the humanities and the social sciences. He said there was an argument that there has been a vicious downward spiral that oppresses the discipline, by making the quality scholarship susceptible to judgements about sales appeal for titles in the market, assessments which may be unrelated. Now there is a new model called Knowledge Unlatched which is attempting to break this cycle and improve the number of quality long form outputs in Humanities and Social Sciences.

Nightmare scenarios

Tom started by discussing the correlation between academic integrity and research fraud by discussing the disincentives in the system. What are potential ‘nightmare’ scenarios?

For early career researcher nightmares include the PhD failing, being rejected for a job or promotion application, a grant application fails, industry or consultancy protocols fail or a paper doesn’t get accepted.

However a worse nightmare is a published or otherwise proclaimed finding is found to be at fault – either through a mistake or there is something more deliberate at play. This is a nightmare for the individual.

However it is very bad news for an institution to be on the front page news. This is very difficult to rectify.

Tom spoke about Jan Hendrik Schon’s deception. Schon was a physicist who qualified in Germany, went to work in Bell Labs in the US. He discovered ‘organic semiconductors’. The reviewers were unable to replicate the results because they didn’t have any access to the original data with lab books destroyed and samples damaged beyond recovery. The time taken to investigate and the eventual withdrawal of the research was 12.5 years, and the effort involved was extraordinary.

Incentives for institutions and researchers

Academics work towards recognition and renown, respect and acclaim. This is based on a system of dissemination and publication, which in turn is based on peer review and co-authorship using understood processes. Financial reward is mostly indirect.

Tom then discussed what structures universities might have in place. Most will have some kind of code of conduct to advise people about research misconduct. There are questions about how well understood or implemented this advice or knowledge about those kinds of perspectives actually are.

Universities also often provide teaching about authorship and the attribution of work – there are issues around the extent that student work gets acknowledged and published. Early career researchers are, or should be, advised about requirements in attributing work to others that have not contributed, as well as a good understanding of plagiarism and ethical conduct.

How does openness help?

Tom noted that we are familiar with the idea of open data and open access. But another aspect is ‘open process’. Lab work books for example, showing progress in thinking, approaches and experiments can be made open though there may be some variations in the timing of when this occurs.

The other pressing thing about this is that the nature of research itself is changing profoundly. This includes extraordinary dependence on data, and complexity requiring intermediate steps of data visualisation. In Australia this is called eResearch, in the UK it is called eScience. These eResearch techniques have been growing rapidly, and in a way that may not be understood or well led by senior administrators.

Using data

Tom described a couple of talks by early or mid career researchers at different universities. They said that when they started they were given access to the financial system, the IT and Library privileges. But they say ‘what we want to know are what are the data services that I can get from the University?’. This is particularly acute in the Life Sciences. Where is the support for the tools? What is the University doing by way of scaffolding the support services that will make that more effective for me? What sort of help and training will you provide in new ways of disseminating findings and new publishing approaches?

Researchers are notoriously preoccupied with their own time – they consider they should be supported better with these emerging examples. We need more systematic leadership in understanding these tools with a deliberate attention by institutional leadership to overcoming inertia.

The more sustained argument about things being made open relates to questions about integrity and trust – where arguments are disputes about evidence. What’s true for the academy in terms of more robust approaches to prevent or reduce inaccuracy or fraud, is also true in terms of broader public policy needs for evidence based policy.

Suggestions for improvement

We need concerted action by people at certain levels – Vice Chancellors, heads of funding councils, senior government bureaucrats. Some suggested actions for institutions and research systems at national and international levels include concerted action to:

  • develop and support open frameworks
  • harmonise supporting IP regimes
  • reframe researcher induction
  • improve data and tools support services
  • reward data science methods and re-use techniques
  • rationalise research quality markers
  • foster impact tracking in diverse tools

Discussion

Friction around University tools

One comment noted that disincentives at Cambridge University manifest as frictions around the ways they use the University tools – given they don’t want to waste time.

Tom responded that creating a policy is half the trick. Implementing it in a way that makes sense to someone is the other half. What does a mandate actually mean in a University given they are places where one does not often successfully tell someone else what to do?

However research and support tools are getting more efficient. It is a matter of marshalling the right expertise in the right place. One of the things that is happening is we are getting diverse uptakes of new ideas. This is reliant on the talent of the leadership that might be in place or the team that is in place. It could get held back by a couple of reactionary or unresponsive senior leaders. Conversely the right leadership can make striking progress.

Openness and competition

Another comment was how does openness square with researchers being worried about others finding about what they are doing in a competitive environment?

Tom noted that depending on the field, there may indeed need to be decision points or “gating” that governs when the information is available. The important point is that it is available for review for the reasons of integrity explored earlier. Exceptions will always apply as in the case of contract research being done for a company by an institution that is essentially “black box”. There would always have to be decisions about openness which would be part of working out the agreement in the first place.

Salami slicing publication

A question arose about the habit of salami slicing research into small publications for the benefits of the Research Excellence Framework and how this matches with openness.

Tom agreed that research assessment schemes need to be structured to encourage or discourage certain types of scholarly output in practice. The precursor to this practice was the branching of journal titles in the 1970s – the opportunity for advantage at the time was research groups and publishers. There has to be a leadership view from institutional management on what kind of practical limits there can be on that behaviour.

This sparked a question about the complexity of changing the reward system because researchers are judged by the impact factor, regardless of what we say to them about tweets etc. How could the reward system be changed?

Tom said the change would need to be that the view that reward is only based on research outputs is insufficient. Other research productivity needs reward. This has to be led. It can’t be a half-baked policy – put out by a committee. Needs to be trusted by the research community.

Open access drivers

A question was asked about the extent to which the compliance agenda that has been taken by the funders has led its course? Is this agenda going to be taken by the institutions.

Tom said that he has thought about this for a long time. He thought originally OA would be led by the disciplines because of the example of the High Energy Physics community which built a repository more than 20 years ago. Then there was considerable discussion, eg in the UK in early 2000s about aligning OA with institutional profile. But institutional take up was sporadic. In Australia in 2012 we only had six or seven universities with policies (which doesn’t necessarily mean there had been completely satisfactory take up in each of those).

Through that time the argument for a return on tax payer investment has become the prevalent government one. Tom doesn’t think they will move away from that, even though there has been a level of complexity relating to the position that might not have been anticipated, with large publishers keen to be embedded in process.

This moved to a question of whether this offers an opportunity for the institution beyond the mandate?

Tom replied that he always thought there was an advantage at an institutional and individual level that you would be better off if you made work open. The main commercial reaction has been for the large publishers to seek to convert the value that exists in the subscription market into the same level of value in input fees i.e, Article Processing Charges.

It should be understood finally that academic publishing and the quality certification for research does have a cost, with the question being what that level of cost should really be.

About the speaker

*Emeritus Professor Tom Cochrane was briefly visiting Cambridge from Queensland University of Technology in Australia. During his tenure as the Deputy Vice-Chancellor (Technology, Information and Learning Support), Professor Cochrane introduced the world’s first University-wide open access mandate, in January 2004. Amongst his many commitments Professor Cochrane serves on the Board of Knowledge Unlatched (UK) is a member of the Board of Enabling Open Scholarship (Europe) and was co-leader of the project to port Creative Commons into Australia.

Published 12 October 2015
Written by Dr Danny Kingsley
Creative Commons License

Archiving webpages – securing the digital discourse

We are having discussions around Cambridge about the research activity that occurs through social media. These digital conversations are the ephemera of the 21st century, the equivalent of the Darwin Manuscripts that the University has spent considerable energy preserving and digitising. However, to date we are not currently archiving or preserving this material.

As a starting point, we are sharing here some of the insights Dr Marta Teperek gained from attending the DPTP workshop on Web Archiving on 12 May 2015, led by Ed Pinsent and Peter Webster.

Digital dissemination

Increasingly researchers are realising that online resources are important to disseminate their findings – the subject of our recent blog ‘What is ‘research impact’ in an interconnected word?‘ It is common to use blogs and Twitter to share discoveries.

Some researchers even have dedicated websites to publish information about their research. In the era of Open Science webpages are also used to share research data, especially for programmers, who often use them as powerful tools for providing rich metadata description for their software. It is not uncommon to include a link to a webpage in publications as the source of additional information supporting a paper. In these cases, other researchers need to be able to cite to the webpage as it was at the time of publication. This ensures the content is stable – be it information, dataset, or a piece of software.

The question arises then about preventing ‘linkrot’ and preserving webpages – to ensure the content of a webpage is still going to be accessible (and unaltered) in several years’ time.

What does it mean to archive a webpage?

Archiving is preserving the exact copy of a webpage, as it is at a given moment in time. The most commonly used format for webpage archives are .warc files. These files contain all the information about the page: about its content, layout, structure, interactivity etc. They can be easily re-played to re-create the exact content of the archived webpage, as it was at the time of recording. These .warc files can be shared with colleagues or with the public by various means, for example, by preserving a copy in data repositories.

The right to archive

One of the most interesting topics emerging from almost every talk was who has the right to archive a webpage. The answer would seem simple – the webpage creator. However, webpages often contain information with reference to, or with input from various external resources. Most pages nowadays have feeds from Twitter, allow comments from external users, or have discussion fora. Does the website creator have the rights to archive all these?

In general, anyone can archive the page. Problems start if there are intentions to make the archive available to others – which is typically the driver for archiving the page in the first place. In theory, in order to disseminate the archived page, the archiver should ask all copyright owners of the content of that page for their consent. However, obtaining consent from all copyright owners might be impossible – imagine trying to approach authors of every single tweet on a given webpage.

The recommendation is that people should obtain consent for all elements of the webpage for which it is reasonably possible to get the consent. When making the archive available, there should also be a statement that the best effort was made to obtain consent from all copyright owners. It is good practice to ask any webpage contributors to sign a consent form for archiving and sharing of their contributed content.

Alternative approach to copyright

Some websites have decided to take an alternative approach to dealing with copyright. The Internet Archive simply archives everything, without worrying about copyright. Instead, they have a takedown policy if someone asks them to remove the shared archive. As a consequence of their approach, they are currently the biggest website archive in the world, which as of August 2014 used 50 PetaBytes of storage.

Anyone can archive their websites on the Internet Archive, simply by creating an account to upload the website in the Internet Archive, entering the URL of the webpage to be archived, clicking a button to archive the page, and it is done – the archive will be created and shared.

The workshop inspired us at Cambridge to archive the data website, which is now available on Internet Archive. Snapshots from each of the archiving events can be easily replayed by simply clicking on them.

Can a non-specialist archive the website?

But what if you would like to archive a website yourself – store and share it on your conditions, perhaps using a data repository? Various options for website preservation were discussed during the workshop.

As a non-specialist, the best option is the one which does not require any specialist knowledge, or specialist software installation. A startup company called WebRecorder have created a website which allows anyone to easily archive any page. There is no need to create an account. The user can simply copy the URL of the page to be archived and press ‘record’. This will generate a .warc file of the website.The disadvantage is this needs to be done for every page of the website separately. WebRecorder allows free downloads of .warc files – the files can be downloaded and archived/shared however the user chooses.

If anybody wants to then re-run the website from a .warc file, there are plenty of free software options available to re-play the webpage. Again, an easy solution for non-specialist is to go to WebRecorder. WebRecorder allows one to upload a .warc file and will then easily replay the webpage with a single click on the ‘Replay’ button.

A bouquet for the DPTP workshop

This was an excellent and extremely efficient one-day workshop, due to its dynamic organisation. The workshop was broken down into six main parts, and each of these parts consisted of several very short (usually 10 mins long) presentations and case studies directly related to the subject (no time to draw away!). After every short talk there was time for questions. Furthermore, there were breaks between the main parts of the workshop to allow focused discussions on the subject. This dynamic organisation ensured that every question was addressed, and that all issues were thematically grouped – which in return helped delivering powerful take-home messages from each section.

Furthermore the speakers (who by the way had expert knowledge on the subject) did not recommend any particular solutions, but instead reviewed types of solutions available, discussing their major advantages and disadvantages. This provided the attendees with enough guidance for making informed decisions about solutions most appropriate to their particular situations.

What also greatly contributed to the success of the workshop was the diverse background of attendees: from librarians and other research data managers, to researchers, museum website curators, and European Union projects’ archivists. All these people had different approaches, and different needs about web archiving. Perhaps this is why the breakout sessions were so valuable and deeply insightful.

Published 3 October 2015
Written by Dr Marta Teperek and Dr Danny Kingsley
Creative Commons License