Tag Archives: copyright

Open Research in the Humanities: The Future of Scholarly Communication

Authors: Emma Gilby, Matthias Ammon, Rachel Leow and Sam Moore

This is the second of a series of blog posts, presenting the reflections of the Working Group on Open Research in the Humanities.  Read the opening post here. The working group aimed to reframe open research in a way that was more meaningful to humanities disciplines, and their work will inform the University of Cambridge approach to open research.  This post considers the future of scholarly communication from a humanities perspective. 

PILLAR ONE: THE FUTURE OF SCHOLARLY COMMUNICATION 

This first pillar deals with ‘open access’ narrowly understood: the future of the publication landscape, and the question of the sustainability and viability of different publication models in an open access world.  

Opportunities 

The open access initiative in general values a wide range of contributions to academic life. The arts and humanities thrive on long-term, multi-scale, conversational, collaborative, interdisciplinary projects; all cultural work can be so defined. Any move towards research diversity therefore works in the favour of the arts and humanities.  

Open Research aims first at opening out ‘traditional’ research content, such as that published in journals and monographs. Thus it aims also to demystify the existing publication process. In general, it prioritizes the wide dissemination of public-facing research. Further, it allows us to envisage new forms of publication, such as the use of dynamic images and data visualisation as already undertaken in investigative journalism.1 Other examples of new Open Access formats include semi-public peer-to-peer review and the opportunity for readers to highlight passages and contribute to a crowd-sourced index of terms.2

Support required 

In the immediate and short term, A&H colleagues require institutional support to understand and get to grips with the current routes to open access within academic publishing, which present various advantages and challenges. For more detail see Plan S and the History Journal Landscape, A Royal Historical Society Guidance Paper https://royalhistsoc.org/policy/publication-open-access/plan-s-and-history-journals/ 

Current routes to OA in scholarly publishing include:  

  1. Paying directly for article or book processing charges levied by publishers. This is easy if one’s research falls among the very small percentage of A&H research that is funded by the research councils, who allow for such fees, but otherwise challenging.  
  1. Taking advantage of a ‘read and publish’ deal set up between a publisher and an institution. This is easy if one is at the right institution at the right time, but otherwise challenging. There is also confusion amongst colleagues about what happens when these time-limited, transitional deals expire: will publishers revert to simple processing charges (see above)? Or will all published material by then be fully OA (see below)?  
  1. The self-deposit in an OA institutional repository of a manuscript that is accepted for publication and peer reviewed but that has not been edited or typeset by the publisher in any way. This is easy with the right systems in place, but problematic because it neglects the import of the editing process in A&H research. Without undergoing this process, ‘accepted manuscripts’ are very vulnerable to errors, especially in the case of the very many scholars who regularly work in languages that are not their first, or in the case of early career scholars who are less familiar with critical processes and how to evidence them, or in the case of colleagues with various kinds of disabilities such as dyslexia. Other issues also abound with the deposit of manuscripts in repositories. In cases where scholars receive an acceptance that is subject to improvement, the final ‘date of acceptance’ is ambiguous for legal purposes. And in cases where the work in question uses copyrighted material, further legal issues emerge about when and how it may be possible to circulate this. In all these senses, then, many A&H colleagues simply dislike the thought of their ‘accepted manuscript’ circulating. In the case of institutional repositories, there seems to be a direct and obvious tension between the goals of open research and quality control.  
  1. Publishing with a fully OA journal or academic publisher that does not require a processing charge. This is obviously the most straightforward and therefore best route to OA, but raises the fundamental question of how such work is conducted and funded. The notion of the ‘scholar-led’ press, established and monitored by scholars themselves, presupposes that academics can somehow fit the work of the professional editor, copy editor, translator or type setter etc. into their spare time. In addition, many OA journals rely on charitable donations. Fundraising is also a skilled business: will universities’ development directors and offices be diverted to do the work of seeking these charitable donations? Is it possible for existing publishing houses and presses to construct a sustainable business model that allows for free and open publishing, while overlaying their own professional services onto the scholarly work provided by academics? Can already successful enterprises such as Open Book Publishers in Cambridge3 be ‘scaled up’? The members of the working group have not seen any impact assessments or pilot studies considering which of the current forms of scholarly communication will simply die out in the absence of subscription and royalty income. We would like to see evidence-based impact assessments as a matter of priority. In general, it is unclear whether even the largest and most prestigious scholarly societies will survive the loss of income that will result from a move to OA. As one member of our group put it, ‘the research is not open if it is dead’.  

Many questions remain, above and beyond those already evoked:  

  • The situation with respect to the goal of publishing of all academic monographs freely and openly remains extremely fluid, and all the enquiries we were able to make in the working group confirmed that this is an area of great uncertainty. Academic books require considerable up-front investment by publishers, and it is vital that this labour and expertise is properly supported in an open access model. How to ensure that open access books do not entail a race to the bottom in terms of editorial and production standards? 
  • Researchers and publishers will also have to think carefully about content such as book reviews, notices, short discussion pieces, author interviews and so on: content that is useful to the discipline, but peripheral to the article form and that would not generally appear in a repository, for example.   
  • The place of UK debates in the global publishing industry is unclear. Like all scholarly publishing, A&H publishing is international in nature and most journals and presses will draw from as wide an international field as possible. How will the editor of a UK-based journal, responding to the OA requirements of UK decision-making bodies, deal with international authors who are not subject to the same requirements or set of priorities? How will an international editor deal with UK academics?5 These questions come up repeatedly in conversations with colleagues.  
  • Scholarly societies in the arts and humanities do not charge a fortune for their journals, and also offer conferences, communities and support (financial and otherwise) for early-career scholars. To analyse the costs and benefits of access to their publications, it will be necessary to look across cost centres within any given institution. To offer a worked example of library costs from 2019, ‘the bundled UK cost for 2020 the RHS’s Transactions and its Camden book series is £205 (this is a maximum figure, excluding all discounts). In the financial year 1 July 2018-30 June 2019, RHS awarded (for example) £2,781.56 to support ECR researchers at York University and £3,177.16 to support ECR researchers at Oxford.’6 So it would be useful to see studies of the rate of institutional return on investment in publications by university libraries.  
  • Concerns about licensing were already well documented and summarized by Peter Mandler in 2014: ‘For one thing, we do not have full ownership of our texts ourselves – we use others’ words and images, often by permission. For another, we have our own norms of how best to incorporate one work within another – e.g. by quotation – which derivative use denies. Most important is our moral right (long acknowledged in law and ethics) to protect the integrity of our work. By all means read and disseminate our work free of charge, but do not change it as you are doing so – write your own work.’6  
  • Concerns about distortions allowed by CC BY in the reuse of oral history interviews and other sensitive/polemical content are important for many A&H colleagues as they are for our colleagues in the social sciences. 
  • Evidence of predatory publishers simply reusing content from repositories is starting to emerge, seemingly justifying concerns about CC BY as opposed to CC BY- NC-ND or CC BY-ND.7 

Footnotes

1See for instance a project on the takeover of real estate by the Church of Scientology in Clearwater, Florida: https://projects.tampabay.com/projects/2019/investigations/scientology-clearwater-real-estate, or a series of investigative articles on the post-9/11 burgeoning of the US intelligence services collected here: https://www.washingtonpost.com/people/william-m-arkin/

2Matthew Gold & Lauren Klein, eds. Debates in the Digital Humanities (2012), https://dhdebates.gc.cuny.edu

3 ‘We are a nonprofit independent publisher with no institutional backing. Open Book relies on sales and donations to continue publishing high-quality and free to read titles. We gratefully acknowledge the generous support of The Polonsky Foundationthe Thriplow Charitable Trust, the Jessica E. Smith and Kevin R. Brine Charitable Trust, The Progress Foundation and the Dutch Research Council (NWO).’ https://www.openbookpublishers.com

4 See the following testimony: ‘The bi-lingual, topic-specific journal I edit…draws articles from authors across the world and is published in Switzerland. Hence, specific OA requirements pertaining to UK-based authors will be considered in setting OA policy but will probably not be a determining factor. Hence, if strict requirements are introduced around OA in relation to UK funders, this may serve to reduce the possibility for UK-based authors to submit articles to my journal. This would obviously be an issue for the journal but would also be one for UK academics also, as it would result a more limited range of potential publication outlets.’ Margot Finn, Plan S and the History Journal Landscape, A Royal Historical Society Guidance Paper, pp. 47-8. 

5 Plan S and the History Journal Landscape, A Royal Historical Society Guidance Paper, p. 69, n. 110. 

6 Peter Mandler, ‘Open Access: a Perspective from the Humanities’, Insights 27 (2), 2014, http://doi.org/10.1629/2048-7754.89 

7 Guy Lavender, Jane Secker and Chris Morrison, ‘ What happens when you find your open access PhD thesis for sale on Amazon?’, 8th July 2021, https://blogs.lse.ac.uk/impactofsocialsciences/2021/07/08/what-happens-when-you-find-your-open-access-phd-thesis-for-sale-on-amazon/ 

Book Review: Scholarly Communication – what everyone needs to know®

As we wind down towards the last days of 2018, thoughts go to gifts for family and friends. Here, as our last minute gift idea to you, is a book that should be under the tree of every scholarly communication aficionado.

The following book review appeared in Research Fortnight on 15th September 2018 with the title ‘New readers start here‘. It was edited by John Whitfield and is reproduced here with permission.

Book Review

It is odd to be reviewing a book that stresses the importance of “positive reviews in…prestigious publications” to potential sales and publishers’ reputations. Nonetheless, it is safe to say that Scholarly Communication: What everyone needs to know, by Rick Anderson, is excellent.

Scholarly communication is a complex and fast-growing area. Even those working in it find keeping up to date a challenge. The challenge is much greater for those working more widely in research and academia, let alone the general public. The market is ripe for an understandable, generalist overview that explains what scholarly communication is and why anyone should care.

To address the latter point, Anderson notes in his introduction that “there are issues related to scholarly communication about which it would make sense for all of us to know something”. His argument is that decisions made worldwide on health, environment, economics and so on are all underpinned by academic research, reported through the scholarly communication system.

Anderson does a masterful job of distilling the stakeholders, issues and facts into an understandable whole. The discussions about open access and controversies and problems are handled sensitively; a challenge given the wide range of perspectives in this area.

The chapter on copyright is particularly helpful. This is a fundamental aspect of almost all scholarly communication and an area where many people are unsure. In clear language, Anderson explains fair use and fair dealing, licensing, the Creative Commons licences used in open-access publication, orphan works and patents. To do so without overwhelming or boring the reader is something of an achievement.

Anderson’s writing is eloquent and his explanations are clear and precise. Other highlights include discussions of how researchers use e-books, the projects from Google and the HathiTrust repository to digitise books, and an excellent description of how digitisation is allowing libraries to share their special and rare collections with a wider audience.

The book is structured as a series of questions with short answers of one to five pages. This format invites readers to dip in and out of the sections they are interested in. Mostly this approach is successful, although it does result in some repetition.

Anderson is also a victim, in a small way, of the very dynamism that he aims to capture. The volatility of scholarly communication means that most of the specialist discussion tends to occur in outlets that publish quickly, such as mega journals and blogs. The timescale for a regular journal article, which can take a couple of years to get from submission to publication, is too long and risks the contents losing relevance. Books have similarly long lead times; coupled with the dynamic nature of scholarly communication, this makes some out-of-dateness inevitable.

For example, in his chapter on metrics Anderson notes that “the universe of altmetrics is a highly dynamic one, and products and services…seem to be born and die nearly every month”. This is evidenced within that very chapter: among the companies offering research metrics, it mentions Thomson Reuters (whose intellectual property and science business was sold to private equity and changed its name to Clarivate Analytics in October 2016), Delicious (bought by Pinboard in June 2017) and Plum Analytics, which has kept its name but was bought by Elsevier in February 2017.

As the associate dean for collections and scholarly communication at the University of Utah, Anderson makes for a well-qualified author, although the text does reflect his North American perspective. Generally this is not a problem, although a statement such as “There is a professional organisation for university press publishing: the American Association of University Presses” implies, inaccurately, that the rest of the world lacks such organisations.

This is a vast topic, and clearly decisions needed to be made over what to include and omit. Some omissions are easier to justify than others. I would have liked a deeper exploration of the commercial academic publishing market, as this drives much of the activity in the open-access space. The lack of this might reflect the level of disagreement over even basic definitions in scholarly communication, something Anderson acknowledges.

But that’s a minor quibble. Given the need for a book such as this, it would not be surprising if it became compulsory reading for training courses in scholarly communication.

Published 18 December 2018
Written by Dr Danny Kingsley
Creative Commons License

Relax everyone, Plan S is just the beginning of the discussion

If you are working (or even vaguely interested) in the scholarly communication space then you will not have failed to hear about the release of ‘Plan S’ last week. There has been a slew of reports and commentary (at the end of the sister blog “Most Plan S principles are not contentious”). Here’s another (hopefully useful) addition to the mix.

The document identifies the key target as being: After 1 January 2020 scientific publications on the results from research funded by public grants provided by national and European research councils and funding bodies, must be published in compliant Open Access Journals or on compliant Open Access Platforms.” There are 10 supporting principles to this statement.

The plan is specifically engineered to force the hand of publishers and academics to really embrace (begrudgingly adopt?) change. Personally I welcome a bit of disruption. It will be no surprise to anyone that I consider the policies that arose from Finch to have failed. But this new development has, understandably, given a few people the jitters.

First up, and if this is all you read remember this, Plan S is a statement of principle. Until we see the actual policies for our funding bodies everything is speculation. And while UKRI is one of the 11 13* funding bodies that has signed up to Plan S, it has said that the report from the review of the OA policy is unlikely to appear before the second half of next year.

[*changed on 14 October]

The reassuring part

So the first thing to say is – don’t panic. We have some time. The second is that fully half of the 10 principles are not contentious – see the sister blog. A further two may have some implications for institutional administration and possibly for managing budgets, but are again fairly non contentious from an academic, and mostly even from an institutional, perspective.

And then there were three

So we are down to three principles that need a little more unpacking. They relate to the retention of copyright and the ability choose where to publish. It is worth looking at these in more detail, and consider the information contained in the accompanying document “cOAlition S: Making Open Access a Reality by 2020: A Declaration of Commitment by Public Research Funders”. As it happens, we are already well on our way with many of these principles in the UK anyway. Let’s take a closer look.

Retaining copyright

Authors retain copyright of their publication with no restrictions. All publications must be published under an open license, preferably the Creative Commons Attribution Licence CC BY. In all cases, the license applied should fulfil the requirements defined by the Berlin declaration.

With my OA advocacy hat on I agree with this statement. There is no need for a publisher to hold full copyright over a work. They are able to operate in a commercial environment with a first publication right. Currently the system means that researchers must apply for permission to reuse work of their own if writing a new piece of work. There is a significant side income stream for publishers in relation to copyright ‘management’. Publishers claim they need copyright so they can protect author’s rights, but there appear to be few examples of a publisher protecting, say the integrity of an author’s work rather than the income stream from the work.

And this is not the first statement of this kind. The University of California released on 21 June their Declaration of Rights and Principles to Transform Scholarly Communication which states as one of the principles: “No copyright transfers. Our authors shall be allowed to retain copyright in their work and grant a Creative Commons Attribution license of their choosing”.

However as a person responsible for implementing policy within a large research institution I can see some issues that will need to be managed.

For a start, currently, in the vast majority of cases, while researchers own the copyright of their work, they sign it over to the publisher of their articles. As it happens the retention of copyright is a fundamental principle of the UK Scholarly Communications Licence (UK-SCL) which allows institutions to provide a REF compliant green OA route while allowing authors to retain their rights.

The alternative is to negotiate (as the sector) with the publishing industry to ensure that the publishing agreements that each researcher signs retains the author’s copyright. This would also require a huge advocacy and education programme amongst our community. For an excellent analysis of why there remains such a high level of confusion and misunderstanding about copyright amongst our academic community, I strongly recommend Dr Lizzie Gadd’s guest post to the Scholarly Kitchen Academics and Copyright Ownership: Ignorant, Confused or Misled?

The requirement for an open license is also potentially an issue for some disciplines. While many science based disciplines are not concerned with a requirement to publish under a Creative Commons Attribution (CC-BY) licence, there are members of our Arts, Humanities and Social Science communities who only feel comfortable with a CC-BY-NC-ND license. It is the Non Derivative aspect of the license that is of greatest concern and has been the subject of considerable discussion.

Restriction on ability to publish in a hybrid journal

The “hybrid” model of publishing is not compliant with the above requirements.

The nuclear interpretation of this statement is that funders won’t pay for hybrid at all. There are several precedents for this. Several UK institutions have stopped supporting payment for hybrid. London School of Tropical Diseases and Medicine are now restricted to fully open access journals only. University of St Andrews will no longer be able to pay APCs for articles via the ‘gold’ route in hybrid (subscriptions-based) journals. Their normal criteria is if the journal is listed in DOAJ. A 2016 analysis showed this is a common position.

I have written extensively about hybrid mostly arguing against it. But I do support the position that we need to walk carefully here. In our analysis at Cambridge on what might be seen as a ‘progressive’ publisher we noted there is an extremely long tail of society and smaller journals that we don’t publish in much but that collectively are a not insignificant number of papers. Let’s just say that learned societies have some way to go on their open access journey. But if we were to prevent our researchers from being able to publish in these journals this could well deeply affect the learned societies.

That’s why I welcome the statement in the preamble document that ‘transformative’ type of agreements which include offsetting arrangements will be acceptable under certain circumstances. The interpretation of this statement by UKRI into their policy will determine which publishers will be acceptable or otherwise.

Restriction on choice of publication outlet

In case such high quality Open Access Platforms or journals do not yet exist, the Funders will jointly provide incentives and support to establish these.

This one is potentially problematic because of the perception there will be a restriction on choice of publication options. But that is not necessarily the case.

The publishing sector adopted the language of ‘a threat to academic freedom’ this year in relation to the question of funders refusing to pay for hybrid open access. Academic freedom refers to freedom of expression not freedom of choice of publication outlet. This language is again being used by  the publishing sector in light of Plan S.

This language is now also being used by the academic sector. In an impassioned post European scientists state that Plan S means researchers are “forbidden to publish in subscription journals, including in hybrid ones, where OA option is available at an extra cost.” This is simply not the case. As described above, not all hybrid is necessarily off the table.

The other point that seems to be missed is under Plan S, authors can publish wherever they choose if they deposit the Author’s Accepted Manuscript in an institutional repository under a CC-BY license with a zero month embargo. We are halfway there already in the UK where authors generally are already depositing their work to an institutional repository for REF compliance. The part that requires attention then goes back to the question of the authors retaining copyright over their own work.

The question of access to open access publishing options is more complicated. There are many disciplines in which there are very few open access journals at all. These will need specific support especially initially in relation to these policies. Even then this is going to be tricky because establishing a new journal takes time. There are a few precedents, the Wellcome Trust launched Wellcome Open Research in 2016 based on the F1000 platform, and Bill and Melinda Gates Foundation followed suit using the same platform in 2017. But these are unlikely to reassure many of our researchers.

The elephant in the room

There are some serious concerns with Plan S which relate to the equity issue of moving to a pay to publish ecosystem. These are valid and need to be discussed in the broader context of the open research debate. But that is not the theme of the majority of concerns from the academic sector. Those worries about freedom of choice to publish point to the real problem – what is attached to publication.

The problem is not Plan S, or open access per se. Publishing in specific journals or with specific publishers is primarily an issue of career prospects rather than of disseminating the work, and has been for a long time. When researchers say that the right to publish in an outlet of their choosing threatens ‘academic freedom’ they are referring to their ability to subsequently succeed in future job applications, promotions and grant applications. It is the academic reward system in which everyone is trapped.

Indeed the Plan S preamble refers to a “misdirected reward system which puts emphasis on the wrong indicators (e.g. journal impact factor)”. It commits to “fundamentally revise the incentive and reward system of science” and suggests that the San Francisco Declaration on Research Assessment (DORA) as a starting point.

This is the real conversation we need to be having. It is not an easy one to address, but for those who have been arguing for the need to have a serious, international, sector wide conversation about this, Plan S offers a welcome shot in the arm.

Published 12 September 2018
Written by Dr Danny Kingsley
Creative Commons License

Scare campaigns, we have seen a few

In a sister post, I identified the latest scare offensive in the ongoing discussions around open access as: ‘restricting choice of publication’. In this, there is an implied threat from editorial boards and publishers that if the UK Scholarly Communication Licence (UKSCL) were to be in place, then these journals would refuse to publish articles from affected researchers.

In this post I want to look at other threats that have been or are lurking in the shadows in the open access debate. The first is tied fairly closely to the ‘restricting choice of publication’ threat.

The new scare – threats to ‘Academic Freedom’

The term ‘Academic Freedom’ comes up a fair bit in discussions about open access. In his tweet sent during  the Researcher to Reader conference*, one of my Advisory Board colleagues Rick Anderson tweeted this comment:

“Most startling thing said to me in conversation at the #R2RConf:
“I wonder how much longer academic freedom will be tolerated in IHEs.” (Specific context: authors being allowed to choose where they publish.)

In this blog I’d like to pick up on the ‘Academic Freedom’ part of the comment (which is not Rick’s, he was quoting).

Academic Freedom, according to a summary in the Times Higher Education is  primarily that “Academic freedom means that both faculty members and students can engage in intellectual debate without fear of censorship or retaliation”.

This definition was based on the American Association of University Professors’ (AAUP) Statement on Academic Freedom which includes, quite specifically, “full freedom in research and in the publication of results”.

Personally I read that as meaning academics should be allowed to publish, not that they have full freedom in choosing where.

Rick has since contacted the AAUP to ask for clarification on this topic. Last Friday, he tweeted that the AAUP has declined to revisit the 1940 statement to clarify the ‘freedom in publication’ statement in light of evolution of scholarly communication since 1940.

The reason why the Academic Freedom/ ‘restricting choice of publication’ threat(s) is so concerning to the research community has changed over time. In the past it was essential to be able to publish in specific outlets because colleagues would only read certain publications. Those publications were effectively the academic ‘voice’. However today, with online publication and search engines this argument no longer holds.

What does matter however is the publication in certain journals is necessary because of the way people are valued and rewarded. The problem is not open access, the problem is the reward system to which we are beholden. And the commercial publishing industry is fully aware of this.

So let’s be clear. Academic Freedom is about freedom of expression rather than freedom of publication outlet and ties into Robert Merton’s 1942 norms of science which are:

  • “communalism”: all scientists should have common ownership of scientific goods (intellectual property), to promote collective collaboration; secrecy is the opposite of this norm.
  • universalism: scientific validity is independent of the sociopolitical status/personal attributes of its participants
  • disinterestedness: scientific institutions act for the benefit of a common scientific enterprise, rather than for the personal gain of individuals within them
  • organized scepticism: scientific claims should be exposed to critical scrutiny before being accepted: both in methodology and institutional codes of conduct.

If a publisher is preventing a researcher from publishing in a journal based on their funding or institutional policy rather than the content of the work being submitted then this is entirely in contravention of all of Robert Merton’s norms of science. But the publisher is not, as it happens, threatening the Academic Freedom of that author.

While we are here, let’s have a quick look at some of the other threats to researchers invoked in the last few years.

Historic scare 1 – Embargoes are necessary for sustainability

In the past the publishing industry has tried to claim research on half-life usage of research articles as ‘evidence’ for the “green open access = cancellations” argument. This sounds plausible except for the lack of any causal link between green open access policies and library subscriptions. The argument here is that embargoes are necessary for the ‘sustainability’ (read profit) of commercial publishers.

We should note the British Academy’s own 2014 finding that “libraries for the most part thought that embargoes for author-accepted manuscripts had little effect on their acquisition policies” and that any real cancellation issue was “the rising cost of journals at a time of budgetary constraint for libraries. If that continues, journals will be cancelled anyway, whether posted manuscripts are available or not.”

My debunking of this claim dates back to 2015 although it did raise its head again loudly in 2017 during discussions around the UKSCL. It is not uncommon for a researcher to express concern about their chosen journal’s viability because of open access. The message has been successfully pushed through to the research community.

Historic scare 2 – The need for full copyright

Copyright is supposed to protect the content creator. The argument I hear repeatedly about why publishers need authors to sign their copyright over to publishers is so they can ‘protect the author’s rights’. But when people sign their copyright away to another entity, copyright becomes a purely economic tool for financial exploitation by that entity.

There is no doubt publishers protect their own copyright. Indeed owning it allows maximum freedom to make money from the content (and prevent anyone else from doing so). But strangely whenever I have asked for examples of publishers stepping in to protect an author’s rights as the result of a copyright transfer agreement, there has been no response.

However it is not uncommon for a researcher to tell you that this is one of the protections that publishers offer them. I defer to Lizzie Gadd here who has published thoughts around the distinctions between copyright culture and scholarly culture. She notes how many academics have been led by publishers to believe that the current copyright culture supports scholarly culture to a far greater extent than it actually does.

Historic scare 3 – Press embargoes

The HEFCE open access policy requires the collection and deposit of work within three months of acceptance (although the first two years of the policy pushed this timeline out to three months from publication).  This means that work is deposited into repositories, and the metadata that exists – the title, the authors, the intended journal and the abstract – is made available before publication. The work itself (and we are talking about the Author’s Accepted Manuscript, not the final Version of Record) is under an infinite embargo which will be set when the work is published. This process has its own problems, discussed elsewhere.

In 2016 there was a blow up about the metadata about an article being in the public domain before publication. Our office received multiple concerned calls by researchers asking us to remove records from the repository until publication because of fear that having that metadata available was in contravention of the embargo rules. They were concerned the journal would refuse to publish their paper. When we investigated, not only was this not publisher policy but if anyone had been threatened in this manner the publishers we contacted requested we forward the information so they could follow up.

It demonstrates how spooked academics can be by their editors/journals/publishers.

Exhausting

This latest ‘restricting choice of publication’ threat is just another in a long line of implied threats that the scholarly communication community is having to manage. Each time a new one looms we need to identify the source, develop evidence and information to counter the threat and try and work with our research community to reassure them.

Between this, and the huge amount of time we have to spend identifying dates of publication or managing publisher and funder policies or keeping track of the funds that are being spent in this space, we are exhausted.

But perhaps that’s the point?

Published 15 March 2018
Written by Dr Danny Kingsley
Creative Commons License

* Note: In the past two years I have written a precis up about the Researcher to Reader event with summaries, see: ‘It is all a bit of a mess’ Observations from Researcher to Reader conference and ‘Be nice to each other’ – the second Researcher to Reader conference. Time pressure means I may not be able to do that this year, but see the Twitter hashtag for the event.

‘Be nice to each other’ – the second Researcher to Reader conference

Aaaaaaaaaaargh! was Mark Carden’s summary of the second annual Researcher to Reader conference, along with a plea that the different players show respect to one another. My take home messages were slightly different:

  • Publishers should embrace values of researchers & librarians and become more open, collaborative, experimental and disinterested.
  • Academic leaders and institutions should do their bit in combating the metrics focus.
  • Big Deals don’t save libraries money, what helps them is the ability to cancel journals.
  • The green OA = subscription cancellations is only viable in a utopian, almost fully green world.
  • There are serious issues in the supply chain of getting books to readers.
  • And copyright arrangements in academia do not help scholarship or protect authors*.

The programme for the conference included a mix of presentations, debates and workshops. The Twitter hashtag is #r2rconf.

As is inevitable in the current climate, particularly at a conference where there were quite a few Americans, the shadow of Trump was cast over the proceedings. There was much mention of the political upheaval and the place research and science has in this.

[*please see Kent Anderson’s comment at the bottom of this blog]

In the publishing corner

Time for publishers to raise to the challenge

The conference opened with an impassioned speech by Mark Allin, the President and CEO of John Wiley & Sons, who started with the statement this was “not a time for retreat, but a time for outreach and collaboration and to be bold”.

The talk was not what was expected from a large commercial publisher. Allin asked: “How can publishers act as advocates for truth and knowledge in the current political climate?” He mentioned that Proquest has launched a displaced researchers programme in reaction to world events, saying, “it’s a start but we can play a bigger role”.

Allin asked what publishers can do to ensure research is being accessed. Referencing “The content trap” by Bharat Anand, Allin said “We won’t as a media industry survive as a media content and putting it in a bottle and controlling its distribution. We will only succeed if we connect the users. So we need to re-engineer the workflows making them seamless, frictionless. “We should be making sure that … we are offering access to all those who want it.”

Allin raised the issue of access, noting that ResearchGate has more usage than any single publisher. He made the point that “customers don’t care if it is the version of record, and don’t care about our arcane copyright laws”. This is why people use SciHub, it is ease of access. He said publishers should not give up protecting copyright but must realise its limitations and provide easy access.

Researchers are the centre of gravity – we need to help them spend more time researching and less time publishing, he says. There is a lesson here, he noted, suppliers should use “the divine discontent of the customer as their north star”. He used the example of Amazon to suggest people working in scholarly communication need to use technology much better to connect up. “We need to experiment more, do more, fail more, be more interconnected” he said, where “publishing needs open source and open standards” which are required for transformational impact on scholarly publishing – “the Uber equivalent”.

His suggestion for addressing the challenges of these sharing platforms is to “try and make your experience better than downloading from a pirate site”, and that this would be a better response than taking the legal route and issuing takedown notices.  He asked: “Should we give up? No, but we need to recognise there are limits. We need to do more to enable access.”

Allin called the situation, saying publishing may have gone online but how much has the internet really changed scholarly communication practices? The page is still a unit of publishing, even in digital workflows. It shouldn’t be, we should have a ‘digital first’ workflow. The question isn’t ‘what should the workflow look like?’, but ‘why hasn’t it improved?’, he said, noting that innovation is always slowed by social norms not technology. Publishers should embrace values of researchers & librarians and become more open, collaborative, experimental and disinterested.

So what do publishers do?

Publishers “provide quality and stability”, according to Kent Anderson, speaking on the second day (no relation to Rick Anderson) in his presentation about ‘how to cook up better results in communicating research’. Anderson is the CEO of Redlink, a company that provides publishers and libraries with analytic and usage information. He is also the founder of the blog The Scholarly Kitchen.

Anderson made the argument that “publishing is more than pushing a button”, by expanding on his blog on ‘96 things publishers do’. This talk differed from Allin’s because it focused on the contribution of publishers.

Anderson talked about the peer review process, noting that rejections help academics because usually they are about mismatch. He said that articles do better in the second journal they’re submitted to.

During a discussion about submission fees, Anderson noted that these “can cover the costs of peer review of rejected papers but authors hate them because they see peer review as free”. His comment that a $250 journal submission charge with one journal is justified by the fact that the target market (orthopaedic surgeons) ‘are rich’ received (rather unsurprisingly) some response from the audience via Twitter.

Anderson also made the accusation that open access publishers take lower quality articles when money gets tight. This did cause something of a backlash on the Twitter discussion with a request for a citation for this statement, a request for examples of publishers lowering standards to bring in more APC income with the exception of scam publishers. [ADDENDUM: Kent Anderson below says that this was not an ‘accusation’ but an ‘observation’. The Twitter challenge for ‘citation please?’ holds.]

There were a couple of good points made by Anderson. He argued that one of the value adds that publishers do is training editors. This is supported by a small survey we undertook with the research community at Cambridge last year which revealed that 30% of the editors who responded felt they needed more training.

The library corner

The green threat

There is good reason to expect that green OA will make people and libraries cancel their subscriptions, at least it will in the utopian future described by Rick Anderson (no relation to Kent Anderson), Associate Dean of University of Utah in his talk “The Forbidden Forecast, Thinking about open access and library subscriptions”.

Anderson started by asking why, if we’re in a library funding crisis, aren’t we seeing sustained levels of unsubscription? He then explained that Big Deals don’t save libraries money. They lower the cost per article, but this is a value measure, not a cost measure. What the Big Deal did was make cancellations more difficult. Most libraries have cancelled every journal that they can without Faculty ‘burning down the library’, to preserve the Big Deal. This explains the persistence of subscriptions over time. The library is forced to redirect money away from other resources (books) and into serials budget. The reason we can get away with this is because books are not used much.

The wolf seems to be well and truly upon us. There have been lots of cancellations and reduction of library budgets in the USA (a claim supported by a long list of examples). The number of cancellations grows as the money being siphoned off book budgets runs out.

Anderson noted that the emergence of new gold OA journals doesn’t help libraries, this does nothing to relieve the journal emergency. They just add to the list of costs because it is a unique set of content. What does help libraries is the ability to cancel journals. Professor Syun Tutiya, Librarian Emeritus at Chiba University in a separate session noted that if Japan were to flip from a fully subscription model to APCs it would be about the same cost, so that would solve the problem.

Anderson said that there is an argument that “there is no evidence that green OA cancels journals” (I should note that I am well and truly in this camp, see my argument). Anderson’s argument that this is saying the future hasn’t happened yet. The implicit argument here is that because green OA has not caused cancellations so far means it won’t do it into the future.

Library money is taxpayers’ money – it is not always going to flow. There is much greater scrutiny of journal big deals as budgets shrink.

Anderson argued that green open access provides inconsistent and delayed access to copies which aren’t always the version of record, and this has protected subscriptions. He noted that Green OA is dependent on subscription journals, which is “ironic given that it also undermines them”. You can’t make something completely & freely available without undermining the commercial model for that thing, Anderson argued.

So, Anderson said, given green OA exists and has for years, and has not had any impact on subscriptions, what would need to happen for this to occur? Anderson then described two subscription scenarios. The low cancellation scenario (which is the current situation) where green open access is provided sporadically and unreliably. In this situation, access is delayed by a year or so, and the versions available for free are somewhat inferior.

The high cancellation scenario is where there is high uptake of green OA because there are funder requirements and the version is close to the final one. Anderson argued that the “OA advocates” prefer this scenario and they “have not thought through the process”. If the cost is low enough of finding which journals have OA versions and the free versions are good enough, he said, subscriptions will be cancelled. The black and white version of Anderson’s future is: “If green OA works then subscriptions fail, and the reverse is true”.

Not surprisingly I disagreed with Anderson’s argument, based on several points. To start, there would need to have a certain percentage of the work available before a subscription could be cancelled. Professor Syun Tutiya, Librarian Emeritus at Chiba University noted in a different discussion that in Japan only 6.9% of material is available Green OA in repositories and argued that institutional repositories are good for lots of things but not OA. Certainly in the UK, with the strongest open access policies in the world, we are not capturing anything like the full output. And the UK is itself only 6% of the research output for the world, so we are certainly a very long way away from this scenario.

In addition, according to work undertaken by Michael Jubb in 2015 – most of the green Open Access material is available in places other than institutional repositories, such as ResearchGate and SciHub. Do librarians really feel comfortable cancelling subscriptions on the basis of something being available in a proprietary or illegal format?

The researcher perspective

Stephen Curry, Professor of Structural Biology, Imperial College London, spoke about “Zen and the Art of Research Assessment”. He started by asking why people become researchers and gave several reasons: to understand the world, change the world, earn a living and be remembered. He then asked how they do it. The answer is to publish in high impact journals and bring in grant money. But this means it is easy to lose sight of the original motivations, which are easier to achieve if we are in an open world.

In discussing the report published in 2015, which looked into the assessment of research, “The Metric Tide“, Curry noted that metrics & league tables aren’t without value. They do help to rank football teams, for example. But university league tables are less useful because they aggregate many things so are too crude, even though they incorporate valuable information.

Are we as smart as we think we are, he asked, if we subject ourselves to such crude metrics of achievement? The limitations of research metrics have been talked about a lot but they need to be better known. Often they are too precise. For example was Caltech really better than University of Oxford last year but worse this year?

But numbers can be seductive. Researchers want to focus on research without pressure from metrics, however many Early Career Researchers and PhD students are increasingly fretting about publications hierarchy. Curry asked “On your death bed will you be worrying about your H-Index?”

There is a greater pressure to publish rather than pressure to do good science. We should all take responsibility to change this culture. Assessing research based on outputs is creating perverse incentives. It’s the content of each paper that matters, not the name of the journal.

In terms of solutions, Curry suggested it would be better to put higher education institutions in 5% brackets rather than ranking them 1-n in the league tables. Curry calls for academic leaders and institutions to do their bit in combating the metrics focus. He also called for much wider adoption of the Declaration On Research Assessment (known as DORA). Curry’s own institution, Imperial College London, has done so recently.

Curry argued that ‘indicators’ would be a more appropriate term than ‘metrics’ in research assessment because we’re looking at proxies. The term metrics imply you know what you are measuring. Certainly metrics can inform but they cannot replace judgement. Users and providers must be transparent.

Another solution is preprints, which shift attention from container to content because readers use the abstract not the journal name to decide which papers to read. Note that this idea is starting to become more mainstream with the research by the NIH towards the end of last year “Including Preprints and Interim Research Products in NIH Applications and Reports

Copyright discussion

I sat on a panel to discuss copyright with a funder – Mark Thorley, Head of Science Information, Natural Environment Research Council , a lawyer – Alexander Ross, Partner, Wiggin LLP and a publisher – Dr Robert Harington,  Associate Executive Director, American Mathematical Society.

My argument** was that selling or giving the copyright to a third party with a purely commercial interest and that did not contribute to the creation of the work does not protect originators. That was the case in the Kookaburra song example. It is also the case in academic publishing. The copyright transfer form/publisher agreement that authors sign usually mean that the authors retain their moral rights to be named as the authors of the work, but they sign away rights to make any money out of them.

I argued that publishers don’t need to hold the copyright to ensure commercial viability. They just need first exclusive publishing rights. We really need to sit down and look at how copyright is being used in the academic sphere – who does it protect? Not the originators of the work.

Judging by the mood in the room, the debate could have gone on for considerably longer. There is still a lot of meat on that bone. (**See the end of this blog for details of my argument).

The intermediary corner

The problem of getting books to readers

There are serious issues in the supply chain of getting books to readers, according to Dr Michael Jubb, Independent Consultant and Richard Fisher from Something Understood Scholarly Communication.

The problems are multi-pronged. For a start, discoverability of books is “disastrous” due to completely different metadata standards in the supply chain. ONIX is used for retail trade and MARC is standard for libraries, Neither has detailed information for authors, information about the contents of chapters, sections etc, or information about reviews and comments.

There are also a multitude of channels for getting books to libraries. There has been involvement in the past few years of several different kinds of intermediaries – metadata suppliers, sales agents, wholesalers, aggregators, distributors etc – who are holding digital versions of books that can be supplied through the different type of book platforms. Libraries have some titles on multiple platforms but others only available on one platform.

There are also huge challenges around discoverability and the e-commerce systems, which is “too bitty”. The most important change that has happened in books has been Amazon, however publisher e-commerce “has a long way to go before it is anything like as good as Amazon”.

Fisher also reminded the group that there are far more books published each year than there are journals – it’s a more complex world. He noted that about 215 [NOTE: amended from original 250 in response to Richard Fisher’s comment below] different imprints were used by British historians in the last REF. Many of these publishers are very small with very small margins.

Jubb and Fisher both emphasised readers’ strong preference for print, which implies that much more work needed on ebook user experience. There are ‘huge tensions’ between reader preference (print) and the drive for e-book acquisition models at libraries.

The situation is probably best summed up in the statement that “no-one in the industry has a good handle on what works best”.

Providing efficient access management

Current access control is not functional in the world we live in today. If you ask users to jump through hoops to get access off campus then your whole system defeats its purpose. That was the central argument of Tasha Mellins-Cohen, the Director of Product Development, HighWire Press when she spoke about the need to improve access control.

Mellins-Cohen started with the comment “You have one identity but lots of identifiers”, and noted if you have multiple institutional affiliations this causes problems. She described the process needed for giving access to an article from a library in terms of authentication – which, as an aside, clearly shows why researchers often prefer to use Sci Hub.

She described an initiative called CASA – Campus Activated Subscriber-Access which records devices that have access on campus through authenticated IP ranges and then allows access off campus on the same device without using a proxy. This is designed to use more modern authentication. There will be “more information coming out about CASA in the next few months”.

Mellins-Cohen noted that tagging something as ‘free’ in the metadata improves Google indexing – publishers need to do more of this at article level. This comment was responded with a call out to publishers to make the information about sharing more accessible to authors through How Can I Share It?

Mellins-Cohen expressed some concern that some of the ideas coming out of RA21 Resource Access in 21st Century, an STM project to explore alternatives to IP authentication, will raise barriers to access for researchers.

Summary

It is always interesting to have the mix of publishers, intermediaries, librarians and others in the scholarly communication supply chain together at a conference such as this. It is rare to have the conversations between different stakeholders across the divide. In his summary of the event, Mark Carden noted the tension in the scholarly communication world, saying that we do need a lively debate but also need to show respect for one another.

So while the keynote started promisingly, and said all the things we would like to hear from the publishing industry, there is still the reality that we are not there yet.  And this underlines the whole problem. This interweb thingy didn’t happen last week. What has actually happened  to update the publishing industry in the last 20 years? Very little it seems. However it is not all bad news. Things to watch out for in the near future include plans for micro-payments for individual access to articles, according to Mark Allin, and the highly promising Campus Activated Subscriber-Access system.

Danny Kingsley attended the Researcher to Reader conference thanks to the support of the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

Published 27 February 2017
Written by Dr Danny Kingsley
Creative Commons License

Copyright case study

In my presentation, I spoke about the children’s campfire song, “Kookaburra sits in the old gum tree” which was written by Melbourne schoolteacher Marion Sinclair in 1932 and first aired in public two years later as part of a Girl Guides jamboree in Frankston. Sinclair had to get prompted to go to APRA (Australasian Performing Right Association) to register the song. That was in 1975, the song had already been around for 40 years but she never expressed any great interest in any propriety to the song.

In 1981 the Men at Work song “Down Under” made No. 1 in Australia. The song then topped the UK, Canada, Ireland, Denmark and New Zealand charts in 1982 and hit No.1 in the US in January 1983. It sold two million copies in the US alone.  When Australia won the America’s Cup in 1983 Down Under was played constantly. It seems extremely unlikely that Marion Sinclair did not hear this song. (At the conference, three people self-identified as never having heard the song when a sample of the song was played.)

Marion Sinclair died in 1988, the song went to her estate and Norman Lurie, managing director of Larrikin Music Publishing, bought the publishing rights from her estate in 1990 for just $6100. He started tracking down all the chart music that had been printed all over the world, because Kookaburra had been used in books for people learning flute and recorder.

In 2007 TV show Spicks and Specks had a children’s music themed episode where the group were played “Down Under” and asked which Australian nursery rhyme the flute riff was based on. Eventually they picked Kookaburra, all apparently genuinely surprised when the link between the songs was pointed out. There is a comparison between the music pieces.

Two years later Larrikin Music filed a lawsuit, initially wanting 60% of Down Under’s profits. In February 2010, Men at Work appealed, and eventually lost. The judge ordered Men at Work’s recording company, EMI Songs Australia, and songwriters Colin Hay and Ron Strykert to pay 5% of royalties earned from the song since 2002 and from its future earnings.

In the end, Larrikin won around $100,000, although legal fees on both sides have been estimated to be upwards $4.5 million, with royalties for the song frozen during the case.

Gregory Ham was the flautist in the band who played the riff. He did not write Down Under, and was devastated by the high profile court case and his role in proceedings. He reportedly fell back into alcohol abuse and was quoted as saying: “I’m terribly disappointed that’s the way I’m going to be remembered — for copying something.” Ham died of a heart attack in April 2012 in his Carlton North home, aged 58, with friends saying the lawsuit was haunting him.

This case, I argued, exemplifies everything that is wrong with copyright.

Further developing the library profession in 2016

In this blog post, Claire Sewell, the OSC’s Research Support Skills Coordinator reflects on a busy year for the professional development of Cambridge library staff.

Librarians are always learning and 2016 was a bumper year for training in the Office of Scholarly Communication (OSC). The OSC has taken an active role in professional development since its foundation but things have stepped up since the dedicated training role of Research Support Skills Coordinator was established at the end of 2015.

The OSC runs two parallel professional development  schemes for library staff:

Supporting Researchers in the 21st Century Programme

The Supporting Researchers Programme offers training in the area of scholarly communication to all library staff at Cambridge University and is designed to equip staff with the skills they will need to work in a modern academic library.

In 2016 there were a total of 30 events attracting an audience of nearly 500 library staff. Attendees were drawn from across faculty, college and the University Library with several repeat attendees. Topics covered included:

  • Altmetrics
  • Bibliometrics
  • Copyright
  • Metadata
  • Open Access
  • Research data management
  • Research integrity
  • Presentation skills

Attendees have been quick to praise the sessions offered with an average of 71% rating sessions as excellent. Feedback has also been positive:

“[I learnt] a lot about metrics and the confidence to go and find out more”.

“Very engaging. Like the speed, got through a lot without it getting too boring or slow!”

“Appreciated that we were walked through the process and implications of funding requirements”

A presentation skills workshop – Presentations: From Design to Delivery – was by far our most popular session of 2016. Although originally scheduled to run twice, three extra sessions had to be added to cope with demand. In total 71 library staff attended these sessions and consistently rated them as excellent. We hope to build on this success by offering further presentation skills training in 2017.

Research Support Ambassador Programme

This intensive programme ran from June – October 2106 and included sixteen participants from across colleges, departments and the University Library. This spread across the University is particularly gratifying as participation is voluntary. The Research Ambassadors embarked on a training programme made up of three strands:

  1. Targeted training sessions in areas covered by the remit of the Office of Scholarly Communication such as Open Access and Research Data Management
  2. The development of transferrable skills such as leadership, presentation skills and working in teams
  3. Small group project work to create tangible training materials which can be shared across the wider library community

This programme has been adapted in response to feedback received after an initial pilot run in 2015. More structure was introduced through the regular training sessions which Ambassadors were required to attend. Extra optional sessions were also offered according to demand, mostly in relation to group projects. Lastly there was a narrower scope to the group project element to ensure that Ambassadors could complete the task within the time available.

The small group projects Ambassadors worked on aim to give back to the Cambridge library community by producing training materials that can be used by all under a Creative Commons licence. In 2016 Ambassadors worked on three projects:

  1. Digital Humanities webpages – webpages highlighting the work that Cambridge University Library is doing in this increasingly important area of scholarship.
  2. Metadata toolkit – these slides and associated activities can be used to teach the research community about the importance of metadata creation.
  3. Online videos – bite sized videos which showcase various different tools which will be of use to researchers in disseminating their research.

The Research Ambassadors are now able to work confidently in their own libraries to provide point-of-need help to the research community. At the same time they have improved their knowledge of the scholarly communication landscape and the range of ways in which they can support the research community.

Promotion

We’ve also been working hard to promote the training we offer in the OSC, both to Cambridge librarians and the wider world.

Webpages have been created for both the Supporting Researchers in the 21st Century and Research Support Ambassador programmes so that interested parties have something to refer to and all information is kept in an accessible place. We held two Research Support Ambassador Showcase sessions in April and October to allow Ambassadors to demonstrate their outcomes and reflect on their participation on both a personal and professional level. There have also been two blog posts about the initial run of the Ambassador programme from both an insider and observer perspective which helped to give new insight into the initiative.

We have more formal plans for promotion of the programme through conference proposals and journal article submissions. More details of these will be made available once we know the outcome!

Moving forward

We have some exciting plans for training in 2017. The OSC recently sent out a survey to help with planning our next round of training and the response has been overwhelming. Re-runs of some popular topics such as copyright and presentation skills were requested along with new sessions on search skills and researching in the workplace. It looks like 2017 is going to be an exciting year for training so please follow our progress via this blog and our training webpages.

Published 17 January 2017
Written by Claire Sewell 

Creative Commons License

Request a copy: process and implementation

This blog post looks at a recent feature implemented in our repository called ‘Request a copy’ and discusses the process and management of the service. There is a related blog post which discusses the uptake and reaction to the facility.

As part of our recent upgrade to the University’s institutional repository (now renamed ‘Apollo‘), we implemented a new feature called ‘Request a copy’. ‘Request a copy’ operates on the principle of peer-to-peer sharing – if an item in Apollo is not yet available to the public, a repository user can ask the author for a copy of the item. Authors sharing copies of their work on an individual basis falls outside the publisher’s copyright restrictions; here, the repository is acting as a facilitator to a process which happens anyway – peer to peer sharing.

The main advantage of the ‘Request a copy’ feature is to open up the University’s most current research to a wider audience. Many of our users do not necessarily come from an academic background, or may be based within another discipline, or an institution where journal subscriptions are more limited. The repository is often their first port of call to find new research as it ranks highly in Google search results. We hope that these users will benefit from ‘Request a copy’ by being able to access new outputs early, at researchers’ discretion. Additionally, this may provide an added benefit to researchers by introducing new contacts and potential collaborations.

How it works

Screen Shot 2016-10-06 at 13.53.30Items in Apollo that are not yet accessible to the wider public are indicated by a padlock symbol that appears on the thumbnail image and filename link which users can usually click to download the file.

Reasons why the file may not yet be publicly available include:

  • Some publishers require that articles in repositories cannot be made available until they are published, or until a specified time after publication
  • We hold a number of digitised theses in the repository, and for some we have been unable to contact the author to secure permission to make their thesis available
  • Authors may choose to make their dataset available only once the related article is published

When a user clicks on a thumbnail or filename link containing a padlock, they are directed to the ‘Request a copy’ form. Here, they provide their name, email address and a message to the author. On clicking ‘Request copy’, an email is sent to the person who submitted the article, containing the user’s details. The recipient of this email then has the option to approve or deny the user’s request, to contact the user for more information, or (if they are not the author) to forward the request to the author.

How it really works

In practice, the process is slightly more complicated. For most of the content in the repository, the person who submitted an item will be a member of repository staff, rather than the item’s author. This means that for the most part, emails generated by the ‘Request a copy’ form were initially sent to members of the Office of Scholarly Communication team. In some cases, these requests were sent to people who have left the University, and we have had to query the system to retrieve these emails. As an interim measure, we have now directed all emails to support@repository.cam.ac.uk. These still need manual processing.

Theses

For theses where we have not received permission from the author to make them available, we forward requests to the University Library’s Digital Content Unit, who have traditionally provided digitised copies of theses at a charge of £65. We have  found however, that once information about this charge is communicated to the requester, very few (approximately 1%) actually complete the process of ordering a thesis copy.

We have been working with the Digital Content Unit on a trial where thesis copies were offered at £30, then £15. However, even at these cheaper prices, uptake remained low (it increased to 10%, but due to the small size of the sample, this only equated to two and three requests at each price point, and therefore may not be statistically significant). This indicates that the objection was to being charged at all, rather than to the particular amount. Work in this area remains ongoing to try and offer thesis copies as cheaply as possible to requesters, while allowing the Digital Content Unit to cover their costs.

Articles

If the request is for an article, we first need to check whether the article has actually been published and is already available Open Access. Although we endeavour to keep all our repository records up to date, unless we are informed that an article has been published, repository staff need to check each article for which publication is pending. This is a time-consuming manual process, and when we have a large backlog, sometimes it can take a while before an article is updated following publication.

If we found that the article has indeed been published and can be made Open Access, we amend the record, make the article available and email the requester to let them know they can now download the file directly from the repository.

On the other hand, if the article is still not published, or if it is under an embargo, we need to forward the request to the corresponding author(s). Sometimes their name(s) and email address(es) will be included within the article itself, and sometimes we have a record of who submitted the article via the Open Access upload form. However, if it is not clear from the article who the corresponding author is, or if their contact details are not included, and if the article was submitted by an administrator rather than one of the authors, we then need to search via the University’s Lookup service for the email addresses of any Cambridge authors, and search the internet for email addresses of any non-Cambridge authors, before we can forward on the request.

As a result, it can take repository staff up to 30 minutes to process an individual request. This is quicker if the article has been requested previously and the author’s contact details are already stored, but can take longer when we need to search. Sometimes, there is also repeat correspondence if the author has any queries, which adds to the total time in processing each request.

Amending our processes

Since introducing ‘Request a copy’, we have started collecting the email addresses of corresponding authors when an article is submitted, and we have commissioned a repository development company to ensure that ‘Request a copy’ emails can be sent directly to those authors for whom we have an email address – a feature that we are hoping to implement in the next few weeks.

However, if the author moves institution, their university email address will no longer be valid, and any requests for their work will again need to come via repository staff. One way to solve this would be to ask for an external (non-university) email address for the corresponding author at the point where they upload the article to the repository. However, this would introduce an extra step to an already onerous process and may act as a further barrier to authors submitting articles in the first place.

Generally, ‘Request a copy’ is a great idea and provides many benefits to the research community and beyond. But the implementation of this service has been challenging. The amount of time taken by each request has meant that some staff members have been redeployed from their usual jobs to facilitate these requests, which also has an impact on the backlog of articles in the repository that need to be checked in case they have since been published. If an article is published but still in the backlog (and therefore not publicly available in the repository), unnecessary requests for it could result in a reputational issue for the Office of Scholarly Communication and the University.

We will continue to look at our processes over the coming academic year, to see how we can improve our current workflows, and identify and resolve any issues, as well as determining where best to focus any further development work. In the related blog post on ‘Request a copy’, I’ll be talking about usage statistics for the service so far, some more unexpected use cases we have encountered, and feedback from our users that will help us to shape the service into the future.

Published 7 October 2016
Written by Sarah Middle
Creative Commons License

Request a copy: uptake and user experience

This post looks at the University of Cambridge repository  ‘Request a copy’ service from the user’s perspective in terms of uptake so far, feedback we have received, and reasons why people might request a copy of a document in our repository. You may be interested in the related blog post on our ‘Request a copy’ service, which discusses the concept behind ‘Request a copy’, the process by which files are requested, and how this has been implemented at Cambridge

Usage Statistics

The Request a Copy button has been much more successful than we anticipated, particularly because there is no actual ‘button’. By the end of September 2016 (four months after the introduction of ‘Request a copy’), we had received 1120 requests (approximately 280 requests per month), the vast majority of which were for articles (68%) and theses (28%). The remaining 4% of requests were for datasets or other types of resource. We are aware that this is a particularly quiet time in the UK academic year, and expect that the number of requests will increase now term has started again.

Of the requests for articles during this period, 38% were fulfilled by the author sending a copy via the repository, and 4% were rejected by clicking the ‘Don’t send a copy’ button. However, these figures could be misleading as a number of authors have also advised us that they have entered into correspondence with the requester to ask them for further information about who they are and why they are interested in this research. Eventually, this correspondence may result in the author emailing a copy of the paper to the requester, but as this happens outside the repository, it does not appear in our fulfilment statistics. Therefore, we suspect the figure for accepted requests is in actual fact slightly higher.

Of the articles requested during this period, 45% were yet to be published, and 55% were published but not yet available to those without a subscription to the journal. The large number of requests made prior to publication indicates the value of having a policy where articles are submitted to the repository on acceptance rather than publication – there is clearly interest in accessing this research among the wider public, and if they are able to make use of it rather than waiting during the sometimes lengthy period between acceptance and publication, this can make the research process more efficient.

Author Survey

To find out why authors might not be fulfilling requests through the repository links, Dr Lauren Cadwallader, one of our Open Access Research Advisors, sent a survey on 6 July 2016 to the 113 authors who had received requests but had not clicked on the repository link or been in touch with repository staff to advise of an alternative course of action. This survey had a 13% response rate, with 15 participants, as well as eight email responses from users who provided feedback but did not complete the survey.

The relatively low response rate is indicative of either a lack of engagement with or awareness of the process – it is possible that the request emails and survey email were dismissed as spam, or that researchers were unable to respond due to an already heavy workload. One way of addressing this could be to include some information about ‘Request a copy’ in our existing training sessions, in particular to emphasise how quick the process can be in cases where the author is happy to approve the request without needing any further information from the user. We have also been developing the wording of the email sent to the author, to explain the purpose of the service more clearly, and to make it sound like a legitimate message that is less likely to be dismissed as spam.

Of the 15 people who participated in the survey, the majority were aware that they had received an email, which shows that lack of response is not always due to emails being lost in spam filters. When asked for the reason why they did not fulfil the request via the repository link, 35% of authors replied that they had emailed the requester directly, either to send the file, to request more information, or to explain why it was not possible for them to share the file at this time. This finding is quite positive, as it indicates that over a third of these requests are indeed being followed up. Although it would be helpful to us to be able to keep track of approvals through the system, at least this means that the service is fulfilling its purpose in providing a way for authors to interact with other interested researchers, and to share their work if appropriate. In fact, one of the aspects that participants liked best about the ‘Request a copy’ service was the ability to communicate directly with the requestor.

Two authors did not respond to the request because the article was available elsewhere on the internet, such as their personal / departmental website, or a preprint server (where the restrictions relating to repositories do not apply), although they did not communicate this to the requestor. In these cases, it is definitely positive that the authors are happy to share their work; however, it does show that there is often an assumption among researchers that people interested in reading their articles will be restricted to those already in their specific disciplinary communities.

Requests from people who are unaware of sites where the research might also be made available demonstrates that there is indeed an appetite among those outside of academia, or from different subject areas. This is generally a really positive thing, as it facilitates the University’s research outputs to educate and inspire a new audience beyond the more traditional communities, and could potentially lead to new collaboration opportunities. To ensure that requestors are able to access the material, and that researchers are not bombarded with requests for documents that are already freely available, authors can provide links to any external websites that are hosting a preprint version of the article, and we will add them to the repository record.

Other responses indicated that we were not necessarily emailing the right person, as participants said that they had not approved the request because they were not the corresponding author, or because they thought a co-author had already responded. At the outset of the service, we felt that emailing as many authors as possible would increase the likelihood of receiving a response; however, the survey results show that it would be better to send requests to the corresponding author(s) only, at least in cases where it is clear who they are.

An issue we have encountered on a semi-regular basis since HEFCE’s Open Access policy came into force is that of making an article’s metadata available prior to its publication. Although HEFCE and funder policies state that an article’s repository record should be discoverable, even if the article itself must be placed under embargo based on publisher restrictions, there is concern among some authors that metadata release breaches the publisher’s press embargo. You can read about this issue in some detail here.

Receiving requests for an article via the ‘Request a copy’ service can be unsettling for authors as it demonstrates how easily the repository record can be accessed, and rather than respond to the request, they contact the Open Access team to ask for the metadata record to be withdrawn until the article is published. This demonstrates a need to communicate more clearly, both on our website and within the ‘Request a copy’ pages in the repository, what is required of authors as part of HEFCE and funder Open Access policies. We will also be more explicit in the ‘Request a copy’ emails sent to authors in stating that sharing their articles via this service will not be seen as a breach of the publisher’s embargo. In cases where the author does not wish to disseminate their article before it is published, they have the option to deny any requests they receive.

Facilitating requests

There have been several instances where press interest around an article at the point of publication has generated a large number of requests, each of which must be responded to individually by the author. This has resulted in several authors asking that we automatically approve every request rather than forwarding them on. Unfortunately this is not possible for us to do, due to the legal issues surrounding ‘Request a copy’.

It is perfectly acceptable for an author to send a copy of their article to an individual, but if a repository makes that article available to everyone who requests it before the embargo has been lifted, this would be a breach of copyright because it would be ‘systematic distribution’. While responding to multiple requests is likely to be seen as an annoyance by an already overstretched researcher, we hope that a large volume of requests will also be viewed in a positive light, as it demonstrates the interest people have in their work.

Use cases

An interesting example of a request we received was actually from one of the authors of the article, as they did not have access to a copy themselves. This raises some questions about communication between the researchers in this case, if the ‘Request a copy’ service was seen to be a better way of gaining access to the author’s own research, rather than contacting one of their co-authors.

A more surprising use case is that of a plaintiff who had lost a legal case. The plaintiff was requesting an as-yet unpublished article that had been written about the case, because the article appears to argue in favour of the plaintiff and could potentially inform a future appeal. This is a good example of how the ‘Request a copy’ service could be of direct benefit in the world outside academia.

Although the vast majority of requests have been for research outputs such as articles, theses and datasets, we also occasionally receive requests for images that belong to collections held in different parts of the University, where high-quality versions are stored in the repository under restricted access conditions. With these requests, it can be more difficult to find who the copyright-holder is, which sometimes requires detective work by the repository team. In one case, permission had to be sought from a photographer who only has a postal address, and therefore required more explanation about the repository more generally, as well as the specific request.

Looking to the future

We will use this research and any further feedback we receive to improve the experience of our ‘Request a copy’ service for both authors and requestors, including implementing the ideas suggested above. Usage statistics will continue to be monitored, and we may run a user survey again to determine how far the service has improved, as well as to identify any new issues.

In the meantime, if you have any comments or questions about our ‘Request a copy’ service, either as an author or a requester (or both), please send us an email to support@repository.cam.ac.uk .

Published 7 October 2016
Written by Sarah Middle
Creative Commons License

Is CC-BY really a problem or are we boxing shadows?

Comments from researchers and colleagues have indicated some disquiet about the Creative Commons (CC-BY) licence in some areas of the academic community. However, in conversation with some legal people and contemporaries at other institutions (some of these exchanges are replicated at the end of the blog) one of the observations was that generally academics are not necessarily cognizant with what the licences offer and indeed what protections are available under regular copyright.

To try and determine whether this was an education and advocacy problem or if there are real issues we had a roundtable discussion on 29 February at Cambridge University attended by about 35 people who were a mixture of academics, administrators, publishers and legal practitioners. The discussion centred on some of the objections raised in the information circulated before the meeting (which is summarised at the end of this blog). For ease of description each objection is addressed in turn.

Background

Creative Commons provide a series of licences that people who create work can add to their work which tell users what they can or cannot do with it. There are a range of licenses that run from no restrictions at all CC-0 to fairly restrictive CC-BY-NC-ND-SA* where the user must attribute the author, not amend the work, cannot make any financial gain from it and must put the same licence on anything they produce using this work.

There are increasing requirements from funders such as the Wellcome Trust and RCUK in the UK that any work published open access must have a Creative Commons Attribution (CC-BY) licence attached to it. The rationale behind this is that research needs to be available for other researchers to both read and reuse, but also to text and data mine without fear of copyright breaches. Work that is available under a CC-BY licence can be easily incorporated into course reading lists without copyright complications.

* Note added 8 March – a comment has been sent through is that the CC-BY-NC-ND-SA is impossible to apply because the share-alike and no derivatives clauses are mutually exclusive and cannot be applied together. See this explanation.

Summary of the discussion

The general feeling in the discussions was that academics do want to share their work but they don’t want things to be used incorrectly. The outcome of the discussion was that while there are some confusions in this area, and we could do some work on advocacy and educational materials there are also some specific cases where CC-BY has the potential to cause issues.  In a small number of cases issues have actually occurred.

Is CC-BY a problem? For whom?

We should note here that CC-BY only affects a proportion of research published in the UK. While all research is potentially affected by the HEFCE requirement to make work available, the route preferred is through placing a copy in a repository. So this discussion affects only those researchers who have a specific grant from the Charities Open Access Fund (Wellcome Trust) or the RCUK. Humanities researchers tend not to hold grants, and for those that do, it is their articles, not their monographs that are affected by this requirement.

While there are some actual concrete examples of issues for researchers in the Arts and Humanities, many of the problems discussed here are what could happen. There was a comment from a scientific publisher that the sciences also had some concerns about CC-BY when it was first introduced, but none of the concerns have actually come to fruition. Another person noted there have been hundreds of thousands of pieces of content published under CC-BY licences, with very few known problem cases or harm. This is telling. The question was raised: Are we just repeating myths?

On the other hand, just because issues haven’t happened yet does not mean that it would not be a serious problem should they did occur. One of the questions at the end of the discussion was: “Are the ethical norms of society strong enough to stop these concerns happening?” It would appear that to date they have been in the sciences.

Moral rights

CC-BY is an attribution licence. This means the moral right for the originator of the work to be identified is retained. However the moral right for the integrity of the research is not protected. The discussion centred around this.

If someone uses work under a CC-BY licence and makes alterations to it, they do need to indicate they have changed a work but not how they have altered it. The concern in the group was that the work could be altered so the meaning is entirely changed and it would still be attributed to the original author.

Authors can object to the derogatory treatment of their work. The recourse of being able to ask to have the originator’s name taken off the work was not seen as satisfactory because then the person who has adapted the work is potentially able to publish the work, which is based substantially on someone else’s work, as their own.

That said, one comment was that academic works are always open to interpretation, whether quoted or not and whether available under a CC-BY licence or not.

Translation

The area of translations does appear to have some concrete examples of problems caused by CC-BY for Humanities & Social Science authors. One of the issues is it is very difficult to check a translation unless the original author can read the language into which their work has been translated.

Plagiarism

Of all of the areas of discussion, plagiarism raised the most opinions. The accusation that CC-BY somehow ‘encourages’ plagiarism is often levelled. Some arguments are that making work available under a Creative Commons licence protect authors against plagiarism rather than encourage it. Works available in the public domain are far more easily identified as the original work than something published on paper and held on a library shelf, for example.

There was a debate about what actually constitutes plagiarism. One opinion was that ‘It’s plagiarism unless it’s in quotes’. However while the use of quote marks would protect the integrity of the work, there is nothing legally wrong with a derivative use of a work that is available under CC-BY – legally this is not plagiarism.

Nothing about the CC-BY licence overrides UK law about fair dealing. One of the lawyers present noted that academics don’t understand the details of copyright. Academics want full protection but also full sharing. In the world of the internet there’s a free-for-all – people copy-and-paste from wherever they want. No-one respects licences, so an academic work is not necessarily protected under current rules.

It was noted that plagiarism occurs all the time, even when articles are all rights reserved and under traditional copyright. And while Open Access publishing does make plagiarism easier (regardless of the licence), it doesn’t change the underlying principle that it’s unethical. Ethical behaviour in academia sits separately from copyright law.

Sensitive information

The area of sensitive information seems to have the strongest case for not using a CC-BY licence. Researchers working in areas that might contain sensitive information – such as medical or criminal areas – spend a great deal of time ensuring that their findings are presented sensitively and ensuring their distribution is appropriate. The concern with CC-BY licences mean that these findings can be misconstrued which would be damaging to the researcher and could go back to the participants and affect them. If presented in the wrong way, altered research outputs could affect not just their research but also participants.

There is an issue about the dialogue between the people that are being studied and if they have any moral rights about how the information is being used.

An example that was given was in anthropology, working with a community of Native Americans in northern California, who released sensitive data and stories from their cultural past which they want to be accessed. However because they have been exploited in the past they wanted some form of restriction on how these things can be reused. This is an example where a CC-BY licence would not be appropriate.

An oral historian discussed the type of work they do with subjects talking about traumatic periods of their life. In these cases the researcher enters in a covenant with them about how their work can be used. This would not be able to be dealt with ethically under a CC-BY licence. The issue is about subsequent control over reuse of research, with concern about it being co-opted and used in another context.

The question about ethical use of material was raised again, with someone noting that no matter what licence it is available under you can’t control what people do with your work if they disagree with you.

Items containing third party copyright

Being required to publish work under a CC-BY licence does cause problems for people whose work contains a large amount of 3rd party material. This is because the burden on the author to obtain permissions for all of the works would be both time consuming and expensive. May researchers have raised questions about whether they can even do their work if they’re required to publish under CC-BY.

That said, if researchers are themselves using CC-BY works this issue is mitigated because they automatically have permission to use the material. This raises the question; does CC-BY make it more difficult or easier?

Commercialisation

There were some examples raised where a series of works that were freely available had been packaged up and sold. This raised the question: Who is being harmed in commercial exploitation of academic works?

Academics do not publish in journals for money, so the originator of a work that is subsequently sold on is not personally losing a revenue stream. There was a distinction between the academic and non-academic publishing environment. It was agreed that the person buying these works are being scammed. The concern is that people are being exploited by being made to pay for things that should be freely available.

The discussion moved to whether a Non Commercial licence would solve this problem. The issue here is the confusion over the definition of ‘commercial’ in this context. An institution that has a revenue stream from student fees could be seen to be commercial and therefore unable to include CC-BY-NC items on their reading lists.

It was noted that CC-BY–NC-ND is extremely restrictive about ways works can be used.

Academic freedom

The discussion several times touched on the broader issue of the government putting an increasing number of requirements against researchers. The questions raised were: “Does someone who is fronting up with the money have the rights to enforce a particular licence? What about the subjects of a study?”

There is supposed to be arms length between funders and universities but a concern is that funding bodies want to have more power to tell academics what to work on.

Next steps

In summary, the discussion indicated that CC-BY licences do not encourage plagiarism, or issues with commercialism within academia (although there is a broader ethical issue). However in some cases CC-BY licences could pose problems for the moral integrity of the work and cause issues with translations. CC-BY licenses do create challenges for works containing sensitive information and for works containing third party copyright.

There is an expectation amongst the academic community that people behave ethically and within cultural norms.

As agreed with the group we have published this blog post which summarises the discussions held this week. In discussions about the Open Access Policy Framework for the University it would be helpful to include a statement that there is concern about CC-BY licences for some disciplines and types of research.

Background information sent to participants prior to the discussion

Commentary on CC-BY in published reports

The issue of the CC-BY licenses was a recurrent theme in A review of the RCUK review of implementation of its OA policy (March 2015). Many arts, humanities and social science disciplines hold ‘principled and practical objections to the use of CC-BY licences’ (p18). This is partly because work under a CC-BY license ‘could be both used commercially in ways of which the author does not approve and also might not be properly acknowledged as their work’ (pp19-20).

The Royal Historical Society evidence to the RCUK review noted that humanities scholars have particular objections to certain kinds of ‘derivative use’ that amount to the encouragement of plagiarism. Because the ‘attribution’ requirement in CC BY is very loose, it is possible for a reuser of a humanities article to alter it and reissue it under their own name, specifying only that it is an adaptation of the original, but without specifying how it has been adapted. In this way reusers may adopt the style, argument and ‘personality’ of the original work under their own name (and even copyright it). This represents a violation of the specific moral right of the author to the integrity of the work, and the only recourse offered to the author by CC BY is to have their name removed from the attribution (which makes the violation worse). This kind of re-use is as likely to degrade as to enhance the public benefit of the research.

The British Academy’s response to the Commons Select Committee (2013) noted that many articles in HSS subjects are the product of single-author scholarship, where there is more of a claim on ‘moral rights’ that are not adequately protected under an unrestricted CC-BY licence. There were also concerns about commercial reuse of work that contains third party copyright, involving complicated permissions. The response suggests that it should be possible to vary Creative Commons licences according to the usages and requirements of different subject areas – and that an ‘Attribution-NonCommercial-NoDerivs’ licence (CC-BY-NC-ND) may very often be more appropriate

Notes on an April 2013 Royal Historic Society position changing workshop on CC-BY and Humanities (chaired by Peter Mandler) noted that the editors of a number of history journals have suggested that the CC-BY licence facilitates and promotes commercial re-use and uses akin to plagiarism; that the licence therefore amounts to an infringement of authors’ moral and intellectual property rights; and that it is likely to damage the quality of education.

The HistoryUK Submission to the 2013 Business, Innovation and Skills Committee Enquiry on Open Access Publishing raised issues about the loss of protection of intellectual property, the dangers associated with allowing derivative works in sensitive areas of research, and the possible increased costs or embargos publishers may feel compensate for the transfer of a commercial asset to a third party.

Comments from researchers and administrators

In preparation for the round table, Danny Kingsley asked her community across the sector what kinds of objections different people in an administrative or library role had heard from researchers. These are summarised below.

English researcher at Cambridge – “I would prefer not to make my work, produced with the benefit of public funding, available in a form that would allow others to exploit it commercially, as the simple CC-BY licence does. My preference would be for the CC BY-NC-SA licence.”

Research Information Specialist – One question to ask here is whether traditional publishing models – such as signing over copyright itself – are really more beneficial to authors, and of course to weigh the risk of a negative CC experience against the benefits of positive ones.

Concerns raised in discussion with academics in the Humanities (reflected in two responses)

  1. A belief that CC BY encourages plagiarism
  2. That content licenced under CC BY is not monitored for copyright and other infringement to the same extent as more restrictive licences (a misguided belief that publishers actively monitor use and reuse of content I think)
  3. I have also heard the more vague concern about ideas being manipulated or twisted in some way and then re-published under the author’s name
  4. That encouraging reuse, especially derivatives, means the author has no control over what people do with the information (and therefore are associated with something that they would rather not be)

Advice provided on Creative Commons and licensing

Published 3 March 2016
Written by Dr Danny Kingsley, with thanks to Dr Philip Boyes and Dr Joyce Heckman for their notes.

Creative Commons License

 

Archiving webpages – securing the digital discourse

We are having discussions around Cambridge about the research activity that occurs through social media. These digital conversations are the ephemera of the 21st century, the equivalent of the Darwin Manuscripts that the University has spent considerable energy preserving and digitising. However, to date we are not currently archiving or preserving this material.

As a starting point, we are sharing here some of the insights Dr Marta Teperek gained from attending the DPTP workshop on Web Archiving on 12 May 2015, led by Ed Pinsent and Peter Webster.

Digital dissemination

Increasingly researchers are realising that online resources are important to disseminate their findings – the subject of our recent blog ‘What is ‘research impact’ in an interconnected word?‘ It is common to use blogs and Twitter to share discoveries.

Some researchers even have dedicated websites to publish information about their research. In the era of Open Science webpages are also used to share research data, especially for programmers, who often use them as powerful tools for providing rich metadata description for their software. It is not uncommon to include a link to a webpage in publications as the source of additional information supporting a paper. In these cases, other researchers need to be able to cite to the webpage as it was at the time of publication. This ensures the content is stable – be it information, dataset, or a piece of software.

The question arises then about preventing ‘linkrot’ and preserving webpages – to ensure the content of a webpage is still going to be accessible (and unaltered) in several years’ time.

What does it mean to archive a webpage?

Archiving is preserving the exact copy of a webpage, as it is at a given moment in time. The most commonly used format for webpage archives are .warc files. These files contain all the information about the page: about its content, layout, structure, interactivity etc. They can be easily re-played to re-create the exact content of the archived webpage, as it was at the time of recording. These .warc files can be shared with colleagues or with the public by various means, for example, by preserving a copy in data repositories.

The right to archive

One of the most interesting topics emerging from almost every talk was who has the right to archive a webpage. The answer would seem simple – the webpage creator. However, webpages often contain information with reference to, or with input from various external resources. Most pages nowadays have feeds from Twitter, allow comments from external users, or have discussion fora. Does the website creator have the rights to archive all these?

In general, anyone can archive the page. Problems start if there are intentions to make the archive available to others – which is typically the driver for archiving the page in the first place. In theory, in order to disseminate the archived page, the archiver should ask all copyright owners of the content of that page for their consent. However, obtaining consent from all copyright owners might be impossible – imagine trying to approach authors of every single tweet on a given webpage.

The recommendation is that people should obtain consent for all elements of the webpage for which it is reasonably possible to get the consent. When making the archive available, there should also be a statement that the best effort was made to obtain consent from all copyright owners. It is good practice to ask any webpage contributors to sign a consent form for archiving and sharing of their contributed content.

Alternative approach to copyright

Some websites have decided to take an alternative approach to dealing with copyright. The Internet Archive simply archives everything, without worrying about copyright. Instead, they have a takedown policy if someone asks them to remove the shared archive. As a consequence of their approach, they are currently the biggest website archive in the world, which as of August 2014 used 50 PetaBytes of storage.

Anyone can archive their websites on the Internet Archive, simply by creating an account to upload the website in the Internet Archive, entering the URL of the webpage to be archived, clicking a button to archive the page, and it is done – the archive will be created and shared.

The workshop inspired us at Cambridge to archive the data website, which is now available on Internet Archive. Snapshots from each of the archiving events can be easily replayed by simply clicking on them.

Can a non-specialist archive the website?

But what if you would like to archive a website yourself – store and share it on your conditions, perhaps using a data repository? Various options for website preservation were discussed during the workshop.

As a non-specialist, the best option is the one which does not require any specialist knowledge, or specialist software installation. A startup company called WebRecorder have created a website which allows anyone to easily archive any page. There is no need to create an account. The user can simply copy the URL of the page to be archived and press ‘record’. This will generate a .warc file of the website.The disadvantage is this needs to be done for every page of the website separately. WebRecorder allows free downloads of .warc files – the files can be downloaded and archived/shared however the user chooses.

If anybody wants to then re-run the website from a .warc file, there are plenty of free software options available to re-play the webpage. Again, an easy solution for non-specialist is to go to WebRecorder. WebRecorder allows one to upload a .warc file and will then easily replay the webpage with a single click on the ‘Replay’ button.

A bouquet for the DPTP workshop

This was an excellent and extremely efficient one-day workshop, due to its dynamic organisation. The workshop was broken down into six main parts, and each of these parts consisted of several very short (usually 10 mins long) presentations and case studies directly related to the subject (no time to draw away!). After every short talk there was time for questions. Furthermore, there were breaks between the main parts of the workshop to allow focused discussions on the subject. This dynamic organisation ensured that every question was addressed, and that all issues were thematically grouped – which in return helped delivering powerful take-home messages from each section.

Furthermore the speakers (who by the way had expert knowledge on the subject) did not recommend any particular solutions, but instead reviewed types of solutions available, discussing their major advantages and disadvantages. This provided the attendees with enough guidance for making informed decisions about solutions most appropriate to their particular situations.

What also greatly contributed to the success of the workshop was the diverse background of attendees: from librarians and other research data managers, to researchers, museum website curators, and European Union projects’ archivists. All these people had different approaches, and different needs about web archiving. Perhaps this is why the breakout sessions were so valuable and deeply insightful.

Published 3 October 2015
Written by Dr Marta Teperek and Dr Danny Kingsley
Creative Commons License