Tag Archives: Libraries

Searching Open Access: steps towards improving discovery of OA in a less than 100% OA world

At the heart of the University of Cambridge’s Open Access Policy is the commitment “to disseminating its research and scholarship as widely as possible to contribute to society”.

Behind this aim is the benefit to researchers worldwide, as the OA2020 vision has it, to “gain immediate, free and unrestricted access to all of the latest, peer-reviewed research”. It’s some irony indeed that the growth of the availability of research as open access does not automatically result, without further community investment, in a corresponding improvement in discoverability.

Key stakeholders met at the British Library to discuss the issue at the end of 2018 and produced an Open Access Discovery Roadmap , to identify areas of work in this space and encourage collaboration in the scholarly communications community.[1] A major theme included the dependence on reliable article licence metadata, but the main message was finding the open infrastructure/interoperability solutions for long-term sustainability “ensuring that the content remains accessible for future generations”.

New web pages on Open Access discovery

Recognizing where we are now, and responding to the present, (probably) partial awareness of the insufficiencies in the OA discovery landscape, Cambridge University Library has added pages to its e-resources website to highlight OA discovery tools and important websites indexing OA content. The motivations for highlighting the options for OA discovery on the new pages is described in this blog post. Our main aim is to bring to light search and discovery of OA as a live topic and prevent it “languishing in undiscoverable places rather than being in plain sight for everyone to find.”[2]

Recently, data from Unpaywall for July 2019 has been used to forecast for growth in availability of articles published as OA by 2025, predicting on the basis of current trends, but conservatively – without even taking full account of the impact of Plan S, for example. This forecast for 2025 predicts

  • 44% of all journal articles will be available as OA
  • 70% of article views will be to OA articles.[3]

Unpaywall’s estimate for availability OA right now is 31%. A third (growing soon to a half) is a significant proportion for anyone’s money, and wanting to signal the shift we have used that statistic as our headline on the page summarizing the most well-known and commonly-used Open Access browser plugins.

Screenshot containing the following text: 'Open Access Browser Plugins.A third to a half of articles have an OA version, but finding them can be a challenge. Save time with these easy-to-install OA discovery tools that search repositories, preprint servers, etc. for you'
Screenshot of Open Access browser plugins webpage

We want the Cambridge researcher to know about these plugins and to be using them, and aim to give minimal but salient information for a selection of one, or several, to be made. Our recommendation is for the Lean Library extension “Library Access” but we have been in touch with Kopernio and QxMD and ensured that members of the University registering to use these plugins will also pick up the connection to our proxy server for seamless off campus access to subscription content where it exists, before the plugin offers an alternative OA version.

Once installed in the user’s browser, the plugin will use the DOI and/or a combination of article metadata elements to search the plugin’s database and multiple other data sources. A discreet, clickable pop-up icon will become live (change colour), on finding an OA article and will deliver the link or the PDF direct to the user’s desktop. Most plugins are compatible with most browsers, Lean’s Library Access adding compatibility with Safari last month.

Each plugin has a different history of development and certain features that distinguish it from others, and we’ve attempted to bring these out on the page. For example noting Unpaywall’s trustworthiness in the library space thanks to its exclusion of ResearchGate and Academia.edu; its harvesting and showing of licence metadata; and its reach with integrating search of its data via library discovery systems. Features we think are relevant for potential users looking for a quick overview of what’s out there are also mentioned, such as Kopernio’s Dropbox file storage benefits and integration with Web of Science and QxMD’s special applications for medical researchers and professionals.

In an adjacent page, Search Open Access, there is coverage of search engines focused on discovering OA content (Google Scholar; 1findr; Dimensions; CORE), a range of sites indexing OA content in different disciplines, both publisher- and community-based, and a selection of repositories and preprint servers, including OpenDOAR.

A screenshot containing the following text: 'Search Open Access. Our selection of the leading and trusted sources to find OA content'
Screenshot of Search Open Access webpage

We hope the site design, based on the very cool Judge Business School Toolbox pages, gets across the basics about the OA plugins available and encourages their take-up. The plugins will definitely bring to the researcher OA alternative versions when subscription access puts the article behind a paywall and, regardless, will expose OA articles in search results that will otherwise be hard to find. The pages’ positioning top-left on the e-resources site is deliberately intended to grab attention, at least for reading left-to-right. It is interesting to see the approach other Universities have taken, using the LibGuide format for example at Queen’s University Belfast and at the University of Southampton.

Experiences with Lean Library’s Library Access plugin

Cambridge has had just over a year of experience implementing Lean Library’s Library Access plugin, and it’s been positive. The impetus for the institutional subscription to this product was as much to take action on the problem for the searcher of landing on publisher websites and struggling with Shibboleth federated sign-on. This problem is well documented (“spending hours of time to retrieve a minimal number of sources”) and most recently is being addressed by the RA21 project.[4] Equally though we wanted to promote OA content in the discovery process, and Lean Library’s latest development of its plugin to favour the delivery of the OA alternative before the default of the subscription version, is aligned with our values (considerations of versioning aside).

So we’re aiming to bring Lean to Cambridge researchers’ attention by recommending it as the plugin of choice for the period we’re in the transition to “immediate, free and unrestricted access” for all. It is only Lean that is providing the 24-hour updated and context-sensitive linking to our EZproxy server for off campus delivery of subscription content plus promoting OA alternative versions via the deployment of the Unpaywall database. The feedback from the Office of Scholarly Communication is favourable and the statistics support the positivity that we hear from our users (for the last year 66,731 for Google Scholar enhanced links; 49,556 article alternative views; a rough estimate against our EZproxy logs showing a probable 2/5 of off campus users are accessing the proxy via Lean).

One area of concern is the ownership of Lean by SAGE Publications, in contrast to the ownership say of Unpaywall as a project of the open-source ImpactStory, and what this means for users’ privacy. The concerns are shared by other libraries implementing Lean.[5] Our approach has been to make the extension’s privacy policy as prominent as possible on our page dedicated to promoting Lean, and to engage with Lean in depth over users’ concerns. We are satisfied with the answers to our questions from Lean and that our users’ data is adequately protected. Even in a rapidly changing arena for OA discovery tools the balance is not so fine when it comes to recommending installation of the Library Access plugin over a preference for the illegitimate and risk-prone SciHub.

Libraries’ discovery services are geared for subscription content

Allowing for influence of searchers’ discipline on choice of discovery service, it’s little surprise that the traditional library catalogue, even when upgraded to a web scale discovery service, prejudices inclusion of subscription over OA content. Of course it does, because this is the content the libraries pay for in the traditional subscription model and the discovery system is pretty much built around that. iDiscover is Cambridge’s discovery space for institutional subscriptions and print holdings of the University’s libraries and within iDiscover Open Access repository content has been enabled for search. Further, the pipe for the institutional repository content (Apollo) is established.

Nonetheless Cambridge will be looking to take advantage of the forthcoming link resolver service for Unpaywall. This is due for release in November 2019 and will surface a link to search Unpaywall from iDiscover when subscription content is unavailable. This link should kick in usually when the search in iDiscover is expanded beyond subscription content, and a form of which has been enabled already by at least one university by including the oadoi.org lookup in the Alma configuration.

English: The reefer ship Ivory Tirupati arriving in Brest with heavy list.
Français : Le navire frigorifique Ivory Tirupati arrivant à Brest, avec une gîte importante
A listing ship.  Picture by Hervé Cozanet, licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.

The righting moment in the angle of list is that point a ship must find to keep it from capsizing, and Library discovery system providers’ integration with OA feels a bit like that – the OA indication was included in the May 2018 iDiscover release and suppliers have been working with CORE for inclusion of CORE content since 2017. That righting moment may be just over the horizon as integration with Unpaywall arrives, and the “competition” element dissipates, as the consultancy JISC used to review the OA discovery tools commented: “As the OA discovery landscape is crowded, OA discovery products compete for space and efficacy against established public infrastructure, library discovery services and commercial services”.[6]

A diffuse but developing landscape

Easy-to-install and effective to use, the OA discovery tools we are promoting are still widely thought of as at best providing a patch, a sticking-plaster, to the problem. A plethora of plugins is not necessarily what the researcher wants, or is attracted by, however necessary the plugin may be to saving time and exposing content in discovery. Possibly the really telling use case has yet to be tried wherein the plugin comes into its own in a big deal cancellation scenario.

Usage statistics for the Lean Library Access plugin are probably a reflection of the fact that the provision of most article content that is required by the University is available via IP access as subscription, and the need for the plugin is almost entirely limited to the off campus user. The Lean plugin’s relatively modest totals are though consistent with reports of plugin adoption by institutions that have cancelled big deals. The poll of the Bibsam Consortium members revealed 75% of researchers did not have any plug-in installed; the percentage for the University of Vienna in particular was 71%; the KTH Royal Institute of Technology authors “rarely used” a plugin.[7]

Another conjecture is that there is an antipathy to any plugin that could be collecting browsing history data and however “dumb” and programmatically-erased, the concern over privacy is such that the universal adoption libraries may hope for is unachievable. The likeliest explanation is possibly around the tipping-point from subscription to OA, and despite the Apollo repository’s usage being one of the highest in the country (1.1 million article downloads from July 2018 to July 2019), Cambridge’s reading of Gold OA is c. 13% of total subscription content, including journal archives. A comparison with the proportions of percentage views by OA types in Unpaywall’s recently published data (cited above) suggests this is on the low side in terms of worldwide trends, but it must be emphasized this is a subset of OA reading and excludes green, hybrid, and bronze. Just consider for instance the 1.5 billion downloads from arXiv globally to date.[8] Similarly, the stats from Unpaywall are overwhelmingly persuasive of the success of the plugin, as of February 2019 it delivered a million papers a day, 10 papers a second.

Graph is showing a steady growth in the total number of open access items from less than 475,000 in January 2016 to nearly 1,700,000. Likewise, the number of institutional repositories increased from 96 to 180 during the same period.
IRUS-UK growth of open access items since January 2016 (The red bars indicate total items, orange bars number of articles and green bars number of articles with DOIs. The blue line indicates the number of institutional repositories)

The inspirational statistician and “data artist” Edward Tufte wrote:

We thrive in information-thick worlds because of our marvellous and everyday capacities to select, edit, single out, structure, highlight, group, pair, merge, harmonize, synthesize, focus, organize, condense, reduce, boil down, choose, categorise, catalog, classify, list, abstract, scan, look into, idealize, isolate, discriminate, distinguish, screen, pigeonhole, pick over, sort, integrate, blend, inspect, filter, lump, skip, smooth, chunk, average, approximate, cluster, aggregate, outline, summarize, itemize, review, dip into, flip through, browse, glance into, leaf through, skim, refine, enumerate, glean, synopsize, winnow the wheat from the chaff, and separate the sheep from the goats.[9]

There’s thriving and there’s too much effort already. Any self-respecting OA plugin user will want to winnow, and make their own decisions on the plugin(s). In a less than 100% OA world, that combination of subscription and OA connection separated from physical location (on/off campus) is a critical advantage of the Lean Library offering, combined as it is with the Unpaywall database. Libraries will find much to critique in the institutional dashboards or analytics tools now built on top of some plugins (e.g. distinction of the physical location when accessing the alternative access version in the Kopernio usage for instance).

From the OA plugin user’s perspective, the emerging cutting edge is currently with the CORE Discovery plugin, as reported at the Open Repositories 2019 conference, in the “first large scale quantitative comparison” of Unpaywall, OA Button, CORE OA Discovery and Kopernio. This report reveals important truths for OA plugin critical adopters, for instance showing less than expected overlap in comparison of the plugins’ returned results from the test sample of DOIs, and the assertion “we can improve hit rate by combining the outputs from multiple discovery tools”.[10]

It’s become popular for our present day Johnson to quote his namesake, so in that vogue we should expect the take-up of Lean Library and CORE Discovery to bring closer that “resistless Day” when researchers the world over get “immediate, free and unrestricted access to all of the latest, peer-reviewed research” and the “misty Doubt” over the OA discovery landscape will be lifted.[11]


[1] Flanagan, D. (2018). Open Access Discovery Workshop at the British Library, Living Knowledge blog 18 December 2018. DOI: https://dx.doi.org/10.22020/v652-2876

[2] Fahmy, S. (2019). Perspectives on the open access discovery landscape, JISC scholarly communications blog. https://scholarlycommunications.jiscinvolve.org/wp/2019/04/24/perspectives-on-the-open-access-discovery-landscape/

[3] Piwowar, H., Priem, J. & Orr, R. (2019). The future of OA: a large-scale analysis projecting Open Access publication and readership. DOI: https://www.biorxiv.org/content/10.1101/795310v1

[4] Hinchliffe, L. Janicke. (2018). What will you do when they come for your proxy server?, Scholarly Kitchen blog. https://scholarlykitchen.sspnet.org/2018/01/16/what-will-you-do-when-they-come-for-your-proxy-server-ra21/

[5] Ferguson, C. (2019). Leaning into browser extensions, Serials Review, v. 45, issue 1-2, p. 48-53.

[6] Fahamy, S. (2019). Perspectives on the open access discovery landscape. JISC Scholarly Communications blog. https://scholarlycommunications.jiscinvolve.org/wp/2019/04/24/perspectives-on-the-open-access-discovery-landscape/

[7] See the presentations from the LIBER 2019 conference on zenodo here https://zenodo.org/record/3259809#.XaA0Qr57lhF and here https://zenodo.org/record/3260301#.XaAz6757lhF

[8] arXiv monthly download rates, https://arxiv.org/stats/monthly_downloads

[9] Tufte, E. Envisioning information, Cheshire, Connecticut, Graphics Press, p. 50.

[10] Knoth, P. (2019). Analysing the performance of open access discovery tools, OR 2019, Hamburg, Germany. https://www.slideshare.net/petrknoth/analysing-the-performance-of-open-access-papers-discovery-tools

[11] Johnson, S., In Eliot, T. S., Etchells, F., Macdonald, H., Johnson, S., & Chiswick Press,. (1930). London: a poem: And The vanity of human wishes. London: Frederick Etchells & Hugh Macdonald. l. 146.


Published Monday 21 October 2019

Written by James Caudwell (Deputy Head of Periodicals & Electronic Subscriptions Manager, Cambridge University Library)

Multiplicity, the unofficial theme of Researcher to Reader 2019

For the past four years at the end of February, publishers, librarians, agents, researchers, technologists and consultants have gathered in London for two days of discussions around the concept of ‘Researcher to Reader’. This blog is my take on what I found the most inspiring, challenging and interesting at the 2019 event. There wasn’t a theme this year per se, but something that did repeatedly arise from where I was standing was the diversity of our perspectives. This is a word that has taken a specific meaning recently, so I am using ‘multiplicity’ instead :

  • The principles of Plan S are calling for multiple business models for open access publishing, according to Dr Mark Schiltz
  • There is now great range in the approaches researchers take to the writing process, as described by Dr Christine Tulley
  • Professor Siva Umpathy described the disparity of standards of living in India which has a profound effect on whether students can engage with research regardless of talent
  • In order to ensure reproducibility of research, we need multiplicity in the research landscape with larger number of smaller research groups working on a wide array of questions, argued Professor James Evans
  • Cambridge University Press is trying to break away from the Book/Journal dichotomy, diversifying with a long-form publication called Cambridge Elements
  • SpringerNature and Elsevier are expanding their business models to encroach into data management and training (although the analogy starts to fall apart here – what this actually represents is a concentration of the market overall).

Anyway, that gives you an idea of the kinds of issues covered. The conference programme is available online and you can read the Twitter conversation from the event (#R2Rconf). Read on for more detail.

The 2019 meeting was, once again, a great programme. (I say that as a member of the Advisory Board, I admit, but it really was).

The Plan S-shaped elephant in the room

Both days began with a bang. The meeting opened with a keynote from Dr Mark Schiltz – President at Science Europe and Secretary General & Executive Head at the Luxembourg National Research Fund – talking about “Plan S and European Research”.

Schiltz explained he felt the current publishing system is a barrier to ensuring the outcomes of research are freely available, noting that hiding results is the antithesis of the essence of science. There was a ‘duty of care’ for funders to invest public funds well to support research. He suggested that there has been little progress in increasing open access to publications since 2009. In terms of the mechanisms of Plan S, he emphasised there are many compliant routes to publication and Plan S “is not about gold OA as the only publication model, it is about principles”. He also noted that there are plans to align Plan S principles with those of OA2020.

As is mentioned in the Plan S principles. Schiltz ended by arguing for the need to revise the incentivisation system in scholarly communications through mechanisms such as DORA. This is the “next big project” for funders, he said.

Catriona McCallum from Hindawi noted DORA is the most vital component for Plan S to work and therefore we need a proper roadmap.  She asked if there was a timeline for how funders will make changes to their own systems for evaluating research and grant applications, as this is an area where societies and funders should work together. Schiltz responded that this process is about making concrete changes to practice, not just policy. There is no timeline but there has been more attention on this than ever before. He noted that Dutch universities are meeting next year to redefine tenure/promotion standards which will be interesting to follow. McCallum observed it could take decades if there is no timeline upfront.

One of the early questions from the audience was from a publisher asking why mirror journals were not permitted under Plan S because they are not hybrid journals. Schiltz disagreed, saying if the journals have the same editorial board then it is effectively hybrid because readers will still need to subscribe to the other half, as they would for hybrid. Needless to say, the publisher disagreed.

The question about why Plan S architects didn’t consult with learned societies before going public was not particularly well answered. Schiltz talked about the numbers of hybrid journals being greater than pure subscription journals now and there was concern that hybrid becomes dominant business model. He said we need an actual transition to gold OA, which is all very well but doesn’t actually answer the question. He did note that: “We do not want learned societies to become collateral damage of Plan S”. He acknowledged that many learned societies use surpluses from their publishing businesses to fund good work. But he did ask: “Is the use of thinly spread library budget to subsidise learned societies’ philanthropic activities appropriate, and to what degree? This is not sustainable”.

So, how do researchers approach the writing process?

Professor Christine Tulley, Professor of English at the University of Findlay, Ohio spoke about “How Faculty Write for Publication, Examining the academic lifecycle of faculty research using interview and survey data”. Tulley is involved with training researchers in writing and publishing among other roles. She has published a book called How Writing Faculty Write, Strategies for Process, Product and Productivity based on her research with top researchers who research about writing. She is also collaborating on De Gruyter survey of researchers on writing (with whom she co-facilitated a workshop on this topic, discussed later in this blog).

Tulley’s first observation is that academics think ‘rhetorically’. Regardless of discipline, her findings in the US show that thinking about where you want publish and the community you want to reach is more important to academics than coming up with an idea. Tulley noted that in the past, the process was that academics wrote first then decided where to publish. But this is not the case now, where instead authors consider readership in the first instance, asking themselves what is the best medium to reach that audience. This is a focus on what can be a narrow audience that an author wants to hit – it is not a matter of ‘reach the world’ but can be as few as five important people. This can limit end publication options.

She also observed that after the top two or three journals, then their rank matters less. Because of this, newer journals/ open access publications can attract readers and submissions, particularly through early release, which is more important that ‘official publication’ she observed. This does talk to the recent increase in general interest in preprints.

In a statement that set the hearts of the librarians in the audience aflutter, Tulley spoke about librarians as “tip-off providers”, being especially useful for early online release of research before the indexing kicks in. She noted that academics view librarians as scholarly research ‘Partners’ rather than ‘Support’. We have also had this discussion within the UK library community.

Equity of access to education

It is always really interesting to hear perspectives from elsewhere – be that across the library/researcher/publisher divides, or across global ones. Two talks at the event were very interesting as they described the situation in India and Bangladesh, highlighting how some issues are shared worldwide and others are truly unique.

Prof Siva Umpathy, Director of the Indian Institute of Science Education, Bhopal, spoke first, emphasising that he was giving his personal opinion, not that of the Indian government. He noted that taxpayers pay for higher education in India and this is the case for most of the global south – fees to students are much less common. This means education is seen as a social responsibility of government.

Umpathy noted that 40% of the population in India is currently under 35 years old. infrastructure and opportunities vary significantly within India let alone across the whole ‘global south’. In some areas of India, the standard of living is equivalent to London. In other areas there is no internet connection. This affects who can engage with research, some very bright students from small villages are at a disadvantage. Even the kind of information that might be available to students in India about where to study and how to apply can be uneven affecting ambitions regardless of how talented the student might be. He described the incredibly competitive process to gain a place in a university, consisting of applications, exams and interviews.

In India, when someone is paying to publish a paper it gives an impression that the work is not as high a quality, after all, if you have good science you shouldn’t have to pay for publication. I should note this is not unique to India – witness an article that was published in The Times Literary Supplement the day after this talk that entirely confuses what open access monograph publishing is about (“Vain publishing – The restrictions of ‘open access’”).

Beyond impressions there are practical issues – bureaucrats don’t understand why an academic would pay for open access publication, why they wouldn’t publish in the ‘best’ mainstream journals, therefore funding in India does not allow for any payment for publishing. This is despite India being a big consumer of open access research. This has practical implications. If India were to join Plan S and mandated OA, it will likely reduce the number of papers he is able to publish by half, because there’s no government funding available to cover APCs.

He called for the need to train and editors and peer reviewers and the importance of educating governments, funders and evaluators and suggested that peer-reviewers are given APC discounts to encourage them to review more for journals. This, of course is an issue in the Global North too. Indeed when we ran some workshops on Peer Review late last year. They were doubly subscribed immediately.

Global reading, local publishing – Bangladesh

Dr Haseeb Irfanullah, a self described ‘research communications enthusiast’ spoke about what Bangladesh can tell us about research communications. He began by noting how access to scientific publications has been improved by the Research 4 Life Partnership and INASP. These innovations for increasing access to research literature to global south over past few years have been a ‘revolution’. He also discussed how the Bangladesh Journals Online project has helped get Bangladeshi journals online, including his journal, Bangladesh Journal of Plant Taxonomy. This helps journals get journal impact factors (JIF).

However, Bangladesh journal publishing is relatively isolated, and is ‘self sustaining’. Locally sourced content fulfils the need. Because promotion, increments and recognition needs are met with the current situation (universities don’t require indexed journals for promotion), then this means there is little incentive to change or improve the process. This seems to be example of how a local journal culture can thrive when researchers are subject to different incentives, although perversely the downside is that they & their research are isolated from international research. A Twitter observation about the JIF was “damned if you do or damned if you don’t”.

He also noted that it is ‘very cheap to publish a journal as everyone is a volunteer’, prompting one person on Twitter to ask: “Is it just me or is this the #elephantintheroom we need to address globally?” Irfanullah has been involved in providing training for editors, workshops and dialogues on standards, mentorship to help researchers get their work published, as well as improving access to research in Bangladesh. He concluded that these challenges can be addressed; for example, through dialogue with policymakers and a national system for standards.

Big is not best when it comes to reproducibility

Professor James Evans, from the Department of Sociology at Chicago University (who was a guest of Researcher to Reader in 2016) spoke on why centralised “big science” communities are more likely to generate non-replicable results by describing the differences between small and large teams. His talk was a whirlwind of slides (often containing a dizzy array of graphics) at breath-taking speed.

The research Evans and his team undertake looks at large numbers of papers to determine patterns that identify replicability and whether the increase in the size of research teams and the rise of meta research has any impact. For those interested, published papers include “Centralized “big science” communities more likely generate non-replicable results” and “Large Teams Have Developed Science and Technology; Small Teams Have Disrupted It”.

Evans described some of the consequences when a single mistake is reused and appears in multiple subsequent papers, ‘contaminating’ them. He used an example of the HeLa cell* in relation to drug gene interactions. Misidentified cells resulted in ‘indirect contamination’ of the 32,755 articles based on them, plus the estimated half a million other papers which cited these cells. This can represent a huge cost where millions of dollars’ worth of research has been contaminated by a mistake.

The problem is scientific communities use the same techniques and methods, which reduces the robust nature of research. Increasingly overlapping research networks with exposure to similar methodologies and prior knowledge – research claims are not being independently replicated. Claims that are highly centralised on star scientists, repeat collaborations & overlapping methods are far less robust and lead to huge distortion in the literature. the larger the team, the more likely their output will support and amplify rather than disrupt prior work. if there is an overlap, e.g. between authors or methodologies, there is more likely to be agreement.

Making the analogy of the difference between Slumdog Millionaire vs Marvel movies, Evans noted that independent, decentralised, non-overlapping claims are far more likely to be robust, replicable & of more benefit to society. It is effectively a form of triangulation. Smaller, decentralised communities are more likely to conduct independent experiments to reproduce results, producing more robust results. Small teams reach further into the past and looks to more obscure and independent work. Bigger is not better – smaller teams are more productive, innovative & disruptive because they have more to gain & less to lose than larger teams.

Large overlapping teams increase agglomeration around the same topics. The research landscape is seeing a decrease in small teams, and therefore a decrease in independence. These types of group receive less funding & are ‘more risky’ because they are not part of the centralised network.

Evans described a disruption to the scientific narrative building on what has incrementally happened before is effectively Thomas Kuhn’s The Structure of Scientific Revolutions from the 1960s. But “disruption delays impact” – there is a tendency of research teams to keep building on previous successes (which come with an existing audience) rather than risking disruption and consequent need for new audiences etc. In addition, the size of the team matters, one of their findings has been that each additional person on a team reduces the likelihood of research being disruptive. But disruption requires different funding models -with a taste for risk.

Evans noted that you need small teams simultaneously climbing different hills to find the best solution, rather than everyone trying to climb the same hill.  This analogy was picked up by Catriona MacCallum who noted that publishers are actually all on the big hill which means they are in the same boat and trying to achieve the same end goal (hence the mess we are now in). So how do publishers move across to the disruptive landscape with lots of higher hills?

*The HeLa cell is an immortal cell line used in scientific research. It is the oldest and most commonly used human cell line. It is called HeLa because it came from a woman called Henrietta Lacks.

Sci Hub – harm or good?

The second day opened with a debate about Sci Hub on the question of “Is Sci-Hub is doing more good than harm to scholarly communication?”.

The audience was asked to vote whether they ‘agreed’ or ‘disagreed’ with the statement. In this first vote 60% of the audience disagreed and 40% agreed. Note this could possibly reflect attendance at the conference of publishers as the largest cohort of 51% of the attendees, or alternatively be a reflection of the slightly problematic wording of the question. More than one person observed on Twitter that they would have appreciated a ‘don’t know’ or ‘neither good nor bad’ options.

The debate itself was held between Dr Daniel Himmelstein, Postdoctoral Fellow at the University of Pennsylvania (in the affirmative – that SciHub is doing good) and Justin Spence, Partner and Co-Founder at Publisher Solutions International (in the negative – that SciHub is doing harm). I have it on good authority the debate will be written up separately, so won’t do so here. One observation I noted was – the question did not define to whom or what the ‘harm’ was being done. The argument against appeared focused on harm to the market but the argument for was discussing benefit to society.

The discussion was opened up to the room but the comment that elicited a clap from the audience was from Jennifer Smith at St George’s University in London who asked if Elsevier’s profits are defensible when there are people on fun runs raising money for charities who are not anticipating their fundraising cash is going to publisher shareholders rather than supporting research. The question she asked is: “who is stealing from whom?”.

At the end of the debate the audience was asked to vote again at which point, 55% disagreed and 45% agreed meaning Himmelstein won over 5% of the audience. This seems surprising given that it seems very rare to actually change anyone’s mind.

But is it a book or a journal?

Nisha Doshi spoke about Cambridge Elements – a publication format that straddles the Book and Journal formats. It was interesting to hear about some of the challenges Cambridge University Press has faced. These ranged from practical in terms of which systems to use for production which seem to be very clearly delineated as either journal system or book systems. CUP is using several book systems, plus ISBNs, but also using ScholarOne for peer review for this project. Other issues have been philosophical. Authors and many others continue to ask “is it a journal or a book?”. CUP have encouraged authors to embed audio and video in their Cambridge Elements, but are not seeing much take-up so far which is interesting given the success of Open Book Publishers.

Doshi listed the lessons CUP has learned through the process of trying to get this new publication form off the ground. It was interesting to see how far Cambridge Elements has come. In October 2017 as part of our Open Access Week events, the OSC hosted CUP to talk about what was described at this point as their “hybrid books and journals initiative“.

What’s the time Mr Wolf?

In 2016, Sally Rumsey and I spoke to the library communities at our institutions (Oxford and Cambridge, respectively) with a presentation: “Watch out, it’s behind you: publishers’ tactics and the challenge they pose for librarians”. Our warnings have increasingly been supported with publisher activity in the sector over the past three years. Two presentations at Researcher to Reader were along these lines.

In the first instance, Springer Nature presented on their Data Support Services which are a commercial offering in direct competition to the services offered by Scholarly Communication departments in libraries. I should note here that Elsevier also charge for a similar service through their Mendeley Data platform for institutions.

Representing an even further encroachment, the second presentation by Jean Shipman from Elsevier was about a new initiative which is training librarians to train researchers about data management. The new Elsevier Research Data Management Librarian Academy (RDMLA) has an emphasis on peer to peer teaching. Elsevier developed a needs assessment for RDM training, assessed library competencies, and library education curriculum before developing the RDMLA curriculum for RDM training. Example units include research data culture, marketing the program to administrators, and an overview of tools such as for coding. Elsevier moving into the training/teaching space is not new, they have had the ‘Elsevier Publishing Campus’ and ‘Researcher Academy’ for some time. But those are aimed at the research community. This new initiative is formally stepping directly into the library space.

Empathy mapping as a workshop structure

One of the features of Researcher to Reader is the workshops which are run in several sessions over the two day period. In all there is not much more time available than a traditional 2.5 – 3 hour workshop prior to the main event, but this format means there is more reflection time between sessions and does focus the thinking when you are all together.

I attended a workshop on “Supporting Early-Career Scholarship” asking: How can librarians, technologists and publishers better support early career scholars as they write and publish their work?

Ably facilitated by Bec Evans, Founder at Prolifiko with Dee Watchorn, Product Engagement Manager at De Gruyter and Christine Tulley, the workshop used a process called Empathy Mapping. Participants were given handouts with comments made by early career researchers during interviews about the writing process as part of a research programme by Prolifiko. This helped us map out the experience of ECRs from their perspective rather than guessing and imposing our own biases.

We were asked to come up with a problem – for my group it was “How can we help an ECR disseminate their first paper beyond the publication process?” And we were then asked to find a solution. Our group identified that these people need to understand the narrative of their work that they can then take through blogs, presentations, Twitter and other outlets. Our proposal was to create an online programme that only allowed 5 minutes for recording (in the way Screencastify only allows 10 minutes) an understandable explanation of their research that they can then upload for commentary by peers in a safe space before going public.

And so, to end

It is helpful to have different players together in a room. This is really the only way we can start to understand one another. As an indicator of where we are at, we cannot even agree on a common language for what we do – in a Twitter discussion about how SciHub is meeting an ‘ease of access’ need that has not been met by publishers or libraries, it became clear that while in the library space we talk about the scholarly publishing *ecosystem*, publishers consider libraries to be part of the scholarly publishing *industry*.

One tweet from a publisher was: “Good to hear Christine Tulley talk about why academics write and what it is important to them at #R2RConf . We don’t want to, but publishers too often think generically about authors as they do about content”. While slightly confronting (authors are not only their clients, but also provide the content for *free*, so should perhaps be treated with some respect), it does underline why it is so essential that we get researchers, librarians and publishers into the same room to understand one other better.

All the more reason to attend Researcher to Reader 2020!

Published 4 March 2019
Written by Dr Danny Kingsley
Creative Commons License

Text and data mining services: an update

Text and Data Mining (TDM) is the process of digitally querying large collections of machine-readable material, extracting specific information and, by analysis, discovering new information about a topic.

In February 2017, a group University of Cambridge staff met to discuss “Text and Data Mining Services: What can Cambridge libraries offer?”  It was agreed that a future library Text and Data Mining (TDM) support service could include:

  • Access to data from our own collections
  • Advice on legal issues, what publishers allow, what data sets and tools are available
  • Registers on data provided for mining and TDM projects
  • Fostering agreements with publishers.

This blog reports on some of the activities, events and initiatives, involving libraries at the University of Cambridge, that have taken place or are in progress since this meeting (also summarised in these slides).  Raising awareness, educating, and teasing out the issues around the low uptake of this research process have been the main drivers for these activities.

March 2017: RLUK 2017 Conference Workshop

The Office of Scholarly Communication (OSC) and Jisc ran a workshop at the Research Libraries UK 2017 conference to discuss Research Libraries and TDM.  Issues raised included licencing, copyright, data management, perceived lack of demand, where to go for advice within an institution or publisher, policy and procedural development for handling TDM-related requests (and scaling this up across an institution) and the risk of lock-out from publishers’ content, as well as the time it can take for a TDM contract to be finalised between an institution and publisher.  The group concluded that it is important to build mechanisms into TDM-specific licencing agreements between institutions and publishers where certain behaviours are expected.  For example, if suspicious activity is detected by a publisher’s website, it would be better not to automatically block the originating institution from accessing content, but investigate this first (although this may depend on systems in place), or if lock-out happens and the activity is legal, participants suggested that institutions should explore compensation for the time that access is lost if significant.

July 2017: University of Cambridge Text and Data Mining Libguide

Developed by the eResources Team, this LibGuide explains about Text and Data Mining (TDM): what it is, what the legal issues are, what you can do and what you should not try to do. It also provides a list of online journals under license for TDM at the University of Cambridge and a list of digital archives for text mining that can be supplied to the University researchers on a disc copy. Any questions our researchers may have about a TDM project, not answered through the LibGuide, can be submitted to the eResources Team via an enquiry form.

July 2017: TDM Symposium

The OSC hosted this symposium to provide as much information as possible to the attendees regarding TDM.  Internal and external speakers, experienced in the field, spoke about what TDM is and what the issues are; research projects in which TDM was used; TDM tools; how a particular publisher supports TDM; and how librarians can support TDM.

At the end of the day a whole-group discussion drew out issues around why more TDM is not happening in the UK and it was agreed that there was a need for more visibility on what TDM looks like (e.g. a need for some hands-on sessions) and increased stakeholder communication: i.e. between publishers, librarians and researchers.

November 2017: Stakeholder communication and the TDM Test Kitchen

This pilot project involves a publisher, librarians and researchers. It is providing practical insight into the issues arising for each of the stakeholders: e.g. researchers providing training on TDM methods and analysis tools, library support managing content accessibility and funding for this, and content licencing and agreements for the publisher. We’ll take a more in-depth look at this pilot in an upcoming blog on TDM – watch this space.

January 2018: Cambridge University Library Deputy Director visits Yale

The Yale University Library Digital Humanities Laboratory provides physical space, resources and a community within the Library for Yale researchers who are working with digital methods for humanities research and teaching. In January this year Dr Danny Kingsley visited the facility to discuss approaches to providing TDM services to help planning here. The Yale DH Lab staff help out with projects in a variety of ways, one example being to help researchers get to grips with digital tools and methods.  Researchers wanting to carry out TDM on particular collections can visit the lab to do their TDM: off-line discs containing published material for mining can be used in-situ. In 2018, the libraries at Cambridge have begun building up a collection of offline discs of specific collections for the same purpose.

June 2018: Text and Data Mining online course

The OSC collaborated with the EU OpenMinTeD project on this Foster online course: Introduction to Text and Data Mining.  The course helps a learner understand the key concepts around TDM, explores how Research Support staff can help with TDM and there are some practical activities that even allow those with non-technical skills try out some mining concepts for themselves.  By following these activities, you can find out a bit more about sentence segmentation, tokenization, stemming and other processing techniques.

October 2018: Gale Digital Scholar Lab

The University of Cambridge has trial access to this platform until the end of December: it provides TDM tools at a front end to digital archives from Gale Cengage.  You can find out more about this trial in this ejournals@cambridge blog.

In summary…

Following the initial meeting to discuss research support services for TDM, there have been efforts and achievements to raise awareness of TDM and the possibilities it can bring to the research process as well as to explore the issues around the low usage of TDM in the research community at large.  This is an on-going task, with the goal of increased researcher engagement with TDM.

Published 23 October 2018
Written by Dr Debbie Hansen
Creative Commons License