Tag Archives: Libraries

Searching Open Access: steps towards improving discovery of OA in a less than 100% OA world

At the heart of the University of Cambridge’s Open Access Policy is the commitment “to disseminating its research and scholarship as widely as possible to contribute to society”.

Behind this aim is the benefit to researchers worldwide, as the OA2020 vision has it, to “gain immediate, free and unrestricted access to all of the latest, peer-reviewed research”. It’s some irony indeed that the growth of the availability of research as open access does not automatically result, without further community investment, in a corresponding improvement in discoverability.

Key stakeholders met at the British Library to discuss the issue at the end of 2018 and produced an Open Access Discovery Roadmap , to identify areas of work in this space and encourage collaboration in the scholarly communications community.[1] A major theme included the dependence on reliable article licence metadata, but the main message was finding the open infrastructure/interoperability solutions for long-term sustainability “ensuring that the content remains accessible for future generations”.

New web pages on Open Access discovery

Recognizing where we are now, and responding to the present, (probably) partial awareness of the insufficiencies in the OA discovery landscape, Cambridge University Library has added pages to its e-resources website to highlight OA discovery tools and important websites indexing OA content. The motivations for highlighting the options for OA discovery on the new pages is described in this blog post. Our main aim is to bring to light search and discovery of OA as a live topic and prevent it “languishing in undiscoverable places rather than being in plain sight for everyone to find.”[2]

Recently, data from Unpaywall for July 2019 has been used to forecast for growth in availability of articles published as OA by 2025, predicting on the basis of current trends, but conservatively – without even taking full account of the impact of Plan S, for example. This forecast for 2025 predicts

  • 44% of all journal articles will be available as OA
  • 70% of article views will be to OA articles.[3]

Unpaywall’s estimate for availability OA right now is 31%. A third (growing soon to a half) is a significant proportion for anyone’s money, and wanting to signal the shift we have used that statistic as our headline on the page summarizing the most well-known and commonly-used Open Access browser plugins.

Screenshot containing the following text: 'Open Access Browser Plugins.A third to a half of articles have an OA version, but finding them can be a challenge. Save time with these easy-to-install OA discovery tools that search repositories, preprint servers, etc. for you'
Screenshot of Open Access browser plugins webpage

We want the Cambridge researcher to know about these plugins and to be using them, and aim to give minimal but salient information for a selection of one, or several, to be made. Our recommendation is for the Lean Library extension “Library Access” but we have been in touch with Kopernio and QxMD and ensured that members of the University registering to use these plugins will also pick up the connection to our proxy server for seamless off campus access to subscription content where it exists, before the plugin offers an alternative OA version.

Once installed in the user’s browser, the plugin will use the DOI and/or a combination of article metadata elements to search the plugin’s database and multiple other data sources. A discreet, clickable pop-up icon will become live (change colour), on finding an OA article and will deliver the link or the PDF direct to the user’s desktop. Most plugins are compatible with most browsers, Lean’s Library Access adding compatibility with Safari last month.

Each plugin has a different history of development and certain features that distinguish it from others, and we’ve attempted to bring these out on the page. For example noting Unpaywall’s trustworthiness in the library space thanks to its exclusion of ResearchGate and Academia.edu; its harvesting and showing of licence metadata; and its reach with integrating search of its data via library discovery systems. Features we think are relevant for potential users looking for a quick overview of what’s out there are also mentioned, such as Kopernio’s Dropbox file storage benefits and integration with Web of Science and QxMD’s special applications for medical researchers and professionals.

In an adjacent page, Search Open Access, there is coverage of search engines focused on discovering OA content (Google Scholar; 1findr; Dimensions; CORE), a range of sites indexing OA content in different disciplines, both publisher- and community-based, and a selection of repositories and preprint servers, including OpenDOAR.

A screenshot containing the following text: 'Search Open Access. Our selection of the leading and trusted sources to find OA content'
Screenshot of Search Open Access webpage

We hope the site design, based on the very cool Judge Business School Toolbox pages, gets across the basics about the OA plugins available and encourages their take-up. The plugins will definitely bring to the researcher OA alternative versions when subscription access puts the article behind a paywall and, regardless, will expose OA articles in search results that will otherwise be hard to find. The pages’ positioning top-left on the e-resources site is deliberately intended to grab attention, at least for reading left-to-right. It is interesting to see the approach other Universities have taken, using the LibGuide format for example at Queen’s University Belfast and at the University of Southampton.

Experiences with Lean Library’s Library Access plugin

Cambridge has had just over a year of experience implementing Lean Library’s Library Access plugin, and it’s been positive. The impetus for the institutional subscription to this product was as much to take action on the problem for the searcher of landing on publisher websites and struggling with Shibboleth federated sign-on. This problem is well documented (“spending hours of time to retrieve a minimal number of sources”) and most recently is being addressed by the RA21 project.[4] Equally though we wanted to promote OA content in the discovery process, and Lean Library’s latest development of its plugin to favour the delivery of the OA alternative before the default of the subscription version, is aligned with our values (considerations of versioning aside).

So we’re aiming to bring Lean to Cambridge researchers’ attention by recommending it as the plugin of choice for the period we’re in the transition to “immediate, free and unrestricted access” for all. It is only Lean that is providing the 24-hour updated and context-sensitive linking to our EZproxy server for off campus delivery of subscription content plus promoting OA alternative versions via the deployment of the Unpaywall database. The feedback from the Office of Scholarly Communication is favourable and the statistics support the positivity that we hear from our users (for the last year 66,731 for Google Scholar enhanced links; 49,556 article alternative views; a rough estimate against our EZproxy logs showing a probable 2/5 of off campus users are accessing the proxy via Lean).

One area of concern is the ownership of Lean by SAGE Publications, in contrast to the ownership say of Unpaywall as a project of the open-source ImpactStory, and what this means for users’ privacy. The concerns are shared by other libraries implementing Lean.[5] Our approach has been to make the extension’s privacy policy as prominent as possible on our page dedicated to promoting Lean, and to engage with Lean in depth over users’ concerns. We are satisfied with the answers to our questions from Lean and that our users’ data is adequately protected. Even in a rapidly changing arena for OA discovery tools the balance is not so fine when it comes to recommending installation of the Library Access plugin over a preference for the illegitimate and risk-prone SciHub.

Libraries’ discovery services are geared for subscription content

Allowing for influence of searchers’ discipline on choice of discovery service, it’s little surprise that the traditional library catalogue, even when upgraded to a web scale discovery service, prejudices inclusion of subscription over OA content. Of course it does, because this is the content the libraries pay for in the traditional subscription model and the discovery system is pretty much built around that. iDiscover is Cambridge’s discovery space for institutional subscriptions and print holdings of the University’s libraries and within iDiscover Open Access repository content has been enabled for search. Further, the pipe for the institutional repository content (Apollo) is established.

Nonetheless Cambridge will be looking to take advantage of the forthcoming link resolver service for Unpaywall. This is due for release in November 2019 and will surface a link to search Unpaywall from iDiscover when subscription content is unavailable. This link should kick in usually when the search in iDiscover is expanded beyond subscription content, and a form of which has been enabled already by at least one university by including the oadoi.org lookup in the Alma configuration.

English: The reefer ship Ivory Tirupati arriving in Brest with heavy list.
Français : Le navire frigorifique Ivory Tirupati arrivant à Brest, avec une gîte importante
A listing ship.  Picture by Hervé Cozanet, licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.

The righting moment in the angle of list is that point a ship must find to keep it from capsizing, and Library discovery system providers’ integration with OA feels a bit like that – the OA indication was included in the May 2018 iDiscover release and suppliers have been working with CORE for inclusion of CORE content since 2017. That righting moment may be just over the horizon as integration with Unpaywall arrives, and the “competition” element dissipates, as the consultancy JISC used to review the OA discovery tools commented: “As the OA discovery landscape is crowded, OA discovery products compete for space and efficacy against established public infrastructure, library discovery services and commercial services”.[6]

A diffuse but developing landscape

Easy-to-install and effective to use, the OA discovery tools we are promoting are still widely thought of as at best providing a patch, a sticking-plaster, to the problem. A plethora of plugins is not necessarily what the researcher wants, or is attracted by, however necessary the plugin may be to saving time and exposing content in discovery. Possibly the really telling use case has yet to be tried wherein the plugin comes into its own in a big deal cancellation scenario.

Usage statistics for the Lean Library Access plugin are probably a reflection of the fact that the provision of most article content that is required by the University is available via IP access as subscription, and the need for the plugin is almost entirely limited to the off campus user. The Lean plugin’s relatively modest totals are though consistent with reports of plugin adoption by institutions that have cancelled big deals. The poll of the Bibsam Consortium members revealed 75% of researchers did not have any plug-in installed; the percentage for the University of Vienna in particular was 71%; the KTH Royal Institute of Technology authors “rarely used” a plugin.[7]

Another conjecture is that there is an antipathy to any plugin that could be collecting browsing history data and however “dumb” and programmatically-erased, the concern over privacy is such that the universal adoption libraries may hope for is unachievable. The likeliest explanation is possibly around the tipping-point from subscription to OA, and despite the Apollo repository’s usage being one of the highest in the country (1.1 million article downloads from July 2018 to July 2019), Cambridge’s reading of Gold OA is c. 13% of total subscription content, including journal archives. A comparison with the proportions of percentage views by OA types in Unpaywall’s recently published data (cited above) suggests this is on the low side in terms of worldwide trends, but it must be emphasized this is a subset of OA reading and excludes green, hybrid, and bronze. Just consider for instance the 1.5 billion downloads from arXiv globally to date.[8] Similarly, the stats from Unpaywall are overwhelmingly persuasive of the success of the plugin, as of February 2019 it delivered a million papers a day, 10 papers a second.

Graph is showing a steady growth in the total number of open access items from less than 475,000 in January 2016 to nearly 1,700,000. Likewise, the number of institutional repositories increased from 96 to 180 during the same period.
IRUS-UK growth of open access items since January 2016 (The red bars indicate total items, orange bars number of articles and green bars number of articles with DOIs. The blue line indicates the number of institutional repositories)

The inspirational statistician and “data artist” Edward Tufte wrote:

We thrive in information-thick worlds because of our marvellous and everyday capacities to select, edit, single out, structure, highlight, group, pair, merge, harmonize, synthesize, focus, organize, condense, reduce, boil down, choose, categorise, catalog, classify, list, abstract, scan, look into, idealize, isolate, discriminate, distinguish, screen, pigeonhole, pick over, sort, integrate, blend, inspect, filter, lump, skip, smooth, chunk, average, approximate, cluster, aggregate, outline, summarize, itemize, review, dip into, flip through, browse, glance into, leaf through, skim, refine, enumerate, glean, synopsize, winnow the wheat from the chaff, and separate the sheep from the goats.[9]

There’s thriving and there’s too much effort already. Any self-respecting OA plugin user will want to winnow, and make their own decisions on the plugin(s). In a less than 100% OA world, that combination of subscription and OA connection separated from physical location (on/off campus) is a critical advantage of the Lean Library offering, combined as it is with the Unpaywall database. Libraries will find much to critique in the institutional dashboards or analytics tools now built on top of some plugins (e.g. distinction of the physical location when accessing the alternative access version in the Kopernio usage for instance).

From the OA plugin user’s perspective, the emerging cutting edge is currently with the CORE Discovery plugin, as reported at the Open Repositories 2019 conference, in the “first large scale quantitative comparison” of Unpaywall, OA Button, CORE OA Discovery and Kopernio. This report reveals important truths for OA plugin critical adopters, for instance showing less than expected overlap in comparison of the plugins’ returned results from the test sample of DOIs, and the assertion “we can improve hit rate by combining the outputs from multiple discovery tools”.[10]

It’s become popular for our present day Johnson to quote his namesake, so in that vogue we should expect the take-up of Lean Library and CORE Discovery to bring closer that “resistless Day” when researchers the world over get “immediate, free and unrestricted access to all of the latest, peer-reviewed research” and the “misty Doubt” over the OA discovery landscape will be lifted.[11]


[1] Flanagan, D. (2018). Open Access Discovery Workshop at the British Library, Living Knowledge blog 18 December 2018. DOI: https://dx.doi.org/10.22020/v652-2876

[2] Fahmy, S. (2019). Perspectives on the open access discovery landscape, JISC scholarly communications blog. https://scholarlycommunications.jiscinvolve.org/wp/2019/04/24/perspectives-on-the-open-access-discovery-landscape/

[3] Piwowar, H., Priem, J. & Orr, R. (2019). The future of OA: a large-scale analysis projecting Open Access publication and readership. DOI: https://www.biorxiv.org/content/10.1101/795310v1

[4] Hinchliffe, L. Janicke. (2018). What will you do when they come for your proxy server?, Scholarly Kitchen blog. https://scholarlykitchen.sspnet.org/2018/01/16/what-will-you-do-when-they-come-for-your-proxy-server-ra21/

[5] Ferguson, C. (2019). Leaning into browser extensions, Serials Review, v. 45, issue 1-2, p. 48-53.

[6] Fahamy, S. (2019). Perspectives on the open access discovery landscape. JISC Scholarly Communications blog. https://scholarlycommunications.jiscinvolve.org/wp/2019/04/24/perspectives-on-the-open-access-discovery-landscape/

[7] See the presentations from the LIBER 2019 conference on zenodo here https://zenodo.org/record/3259809#.XaA0Qr57lhF and here https://zenodo.org/record/3260301#.XaAz6757lhF

[8] arXiv monthly download rates, https://arxiv.org/stats/monthly_downloads

[9] Tufte, E. Envisioning information, Cheshire, Connecticut, Graphics Press, p. 50.

[10] Knoth, P. (2019). Analysing the performance of open access discovery tools, OR 2019, Hamburg, Germany. https://www.slideshare.net/petrknoth/analysing-the-performance-of-open-access-papers-discovery-tools

[11] Johnson, S., In Eliot, T. S., Etchells, F., Macdonald, H., Johnson, S., & Chiswick Press,. (1930). London: a poem: And The vanity of human wishes. London: Frederick Etchells & Hugh Macdonald. l. 146.


Published Monday 21 October 2019

Written by James Caudwell (Deputy Head of Periodicals & Electronic Subscriptions Manager, Cambridge University Library)

Multiplicity, the unofficial theme of Researcher to Reader 2019

For the past four years at the end of February, publishers, librarians, agents, researchers, technologists and consultants have gathered in London for two days of discussions around the concept of ‘Researcher to Reader’. This blog is my take on what I found the most inspiring, challenging and interesting at the 2019 event. There wasn’t a theme this year per se, but something that did repeatedly arise from where I was standing was the diversity of our perspectives. This is a word that has taken a specific meaning recently, so I am using ‘multiplicity’ instead :

  • The principles of Plan S are calling for multiple business models for open access publishing, according to Dr Mark Schiltz
  • There is now great range in the approaches researchers take to the writing process, as described by Dr Christine Tulley
  • Professor Siva Umpathy described the disparity of standards of living in India which has a profound effect on whether students can engage with research regardless of talent
  • In order to ensure reproducibility of research, we need multiplicity in the research landscape with larger number of smaller research groups working on a wide array of questions, argued Professor James Evans
  • Cambridge University Press is trying to break away from the Book/Journal dichotomy, diversifying with a long-form publication called Cambridge Elements
  • SpringerNature and Elsevier are expanding their business models to encroach into data management and training (although the analogy starts to fall apart here – what this actually represents is a concentration of the market overall).

Anyway, that gives you an idea of the kinds of issues covered. The conference programme is available online and you can read the Twitter conversation from the event (#R2Rconf). Read on for more detail.

The 2019 meeting was, once again, a great programme. (I say that as a member of the Advisory Board, I admit, but it really was).

The Plan S-shaped elephant in the room

Both days began with a bang. The meeting opened with a keynote from Dr Mark Schiltz – President at Science Europe and Secretary General & Executive Head at the Luxembourg National Research Fund – talking about “Plan S and European Research”.

Schiltz explained he felt the current publishing system is a barrier to ensuring the outcomes of research are freely available, noting that hiding results is the antithesis of the essence of science. There was a ‘duty of care’ for funders to invest public funds well to support research. He suggested that there has been little progress in increasing open access to publications since 2009. In terms of the mechanisms of Plan S, he emphasised there are many compliant routes to publication and Plan S “is not about gold OA as the only publication model, it is about principles”. He also noted that there are plans to align Plan S principles with those of OA2020.

As is mentioned in the Plan S principles. Schiltz ended by arguing for the need to revise the incentivisation system in scholarly communications through mechanisms such as DORA. This is the “next big project” for funders, he said.

Catriona McCallum from Hindawi noted DORA is the most vital component for Plan S to work and therefore we need a proper roadmap.  She asked if there was a timeline for how funders will make changes to their own systems for evaluating research and grant applications, as this is an area where societies and funders should work together. Schiltz responded that this process is about making concrete changes to practice, not just policy. There is no timeline but there has been more attention on this than ever before. He noted that Dutch universities are meeting next year to redefine tenure/promotion standards which will be interesting to follow. McCallum observed it could take decades if there is no timeline upfront.

One of the early questions from the audience was from a publisher asking why mirror journals were not permitted under Plan S because they are not hybrid journals. Schiltz disagreed, saying if the journals have the same editorial board then it is effectively hybrid because readers will still need to subscribe to the other half, as they would for hybrid. Needless to say, the publisher disagreed.

The question about why Plan S architects didn’t consult with learned societies before going public was not particularly well answered. Schiltz talked about the numbers of hybrid journals being greater than pure subscription journals now and there was concern that hybrid becomes dominant business model. He said we need an actual transition to gold OA, which is all very well but doesn’t actually answer the question. He did note that: “We do not want learned societies to become collateral damage of Plan S”. He acknowledged that many learned societies use surpluses from their publishing businesses to fund good work. But he did ask: “Is the use of thinly spread library budget to subsidise learned societies’ philanthropic activities appropriate, and to what degree? This is not sustainable”.

So, how do researchers approach the writing process?

Professor Christine Tulley, Professor of English at the University of Findlay, Ohio spoke about “How Faculty Write for Publication, Examining the academic lifecycle of faculty research using interview and survey data”. Tulley is involved with training researchers in writing and publishing among other roles. She has published a book called How Writing Faculty Write, Strategies for Process, Product and Productivity based on her research with top researchers who research about writing. She is also collaborating on De Gruyter survey of researchers on writing (with whom she co-facilitated a workshop on this topic, discussed later in this blog).

Tulley’s first observation is that academics think ‘rhetorically’. Regardless of discipline, her findings in the US show that thinking about where you want publish and the community you want to reach is more important to academics than coming up with an idea. Tulley noted that in the past, the process was that academics wrote first then decided where to publish. But this is not the case now, where instead authors consider readership in the first instance, asking themselves what is the best medium to reach that audience. This is a focus on what can be a narrow audience that an author wants to hit – it is not a matter of ‘reach the world’ but can be as few as five important people. This can limit end publication options.

She also observed that after the top two or three journals, then their rank matters less. Because of this, newer journals/ open access publications can attract readers and submissions, particularly through early release, which is more important that ‘official publication’ she observed. This does talk to the recent increase in general interest in preprints.

In a statement that set the hearts of the librarians in the audience aflutter, Tulley spoke about librarians as “tip-off providers”, being especially useful for early online release of research before the indexing kicks in. She noted that academics view librarians as scholarly research ‘Partners’ rather than ‘Support’. We have also had this discussion within the UK library community.

Equity of access to education

It is always really interesting to hear perspectives from elsewhere – be that across the library/researcher/publisher divides, or across global ones. Two talks at the event were very interesting as they described the situation in India and Bangladesh, highlighting how some issues are shared worldwide and others are truly unique.

Prof Siva Umpathy, Director of the Indian Institute of Science Education, Bhopal, spoke first, emphasising that he was giving his personal opinion, not that of the Indian government. He noted that taxpayers pay for higher education in India and this is the case for most of the global south – fees to students are much less common. This means education is seen as a social responsibility of government.

Umpathy noted that 40% of the population in India is currently under 35 years old. infrastructure and opportunities vary significantly within India let alone across the whole ‘global south’. In some areas of India, the standard of living is equivalent to London. In other areas there is no internet connection. This affects who can engage with research, some very bright students from small villages are at a disadvantage. Even the kind of information that might be available to students in India about where to study and how to apply can be uneven affecting ambitions regardless of how talented the student might be. He described the incredibly competitive process to gain a place in a university, consisting of applications, exams and interviews.

In India, when someone is paying to publish a paper it gives an impression that the work is not as high a quality, after all, if you have good science you shouldn’t have to pay for publication. I should note this is not unique to India – witness an article that was published in The Times Literary Supplement the day after this talk that entirely confuses what open access monograph publishing is about (“Vain publishing – The restrictions of ‘open access’”).

Beyond impressions there are practical issues – bureaucrats don’t understand why an academic would pay for open access publication, why they wouldn’t publish in the ‘best’ mainstream journals, therefore funding in India does not allow for any payment for publishing. This is despite India being a big consumer of open access research. This has practical implications. If India were to join Plan S and mandated OA, it will likely reduce the number of papers he is able to publish by half, because there’s no government funding available to cover APCs.

He called for the need to train and editors and peer reviewers and the importance of educating governments, funders and evaluators and suggested that peer-reviewers are given APC discounts to encourage them to review more for journals. This, of course is an issue in the Global North too. Indeed when we ran some workshops on Peer Review late last year. They were doubly subscribed immediately.

Global reading, local publishing – Bangladesh

Dr Haseeb Irfanullah, a self described ‘research communications enthusiast’ spoke about what Bangladesh can tell us about research communications. He began by noting how access to scientific publications has been improved by the Research 4 Life Partnership and INASP. These innovations for increasing access to research literature to global south over past few years have been a ‘revolution’. He also discussed how the Bangladesh Journals Online project has helped get Bangladeshi journals online, including his journal, Bangladesh Journal of Plant Taxonomy. This helps journals get journal impact factors (JIF).

However, Bangladesh journal publishing is relatively isolated, and is ‘self sustaining’. Locally sourced content fulfils the need. Because promotion, increments and recognition needs are met with the current situation (universities don’t require indexed journals for promotion), then this means there is little incentive to change or improve the process. This seems to be example of how a local journal culture can thrive when researchers are subject to different incentives, although perversely the downside is that they & their research are isolated from international research. A Twitter observation about the JIF was “damned if you do or damned if you don’t”.

He also noted that it is ‘very cheap to publish a journal as everyone is a volunteer’, prompting one person on Twitter to ask: “Is it just me or is this the #elephantintheroom we need to address globally?” Irfanullah has been involved in providing training for editors, workshops and dialogues on standards, mentorship to help researchers get their work published, as well as improving access to research in Bangladesh. He concluded that these challenges can be addressed; for example, through dialogue with policymakers and a national system for standards.

Big is not best when it comes to reproducibility

Professor James Evans, from the Department of Sociology at Chicago University (who was a guest of Researcher to Reader in 2016) spoke on why centralised “big science” communities are more likely to generate non-replicable results by describing the differences between small and large teams. His talk was a whirlwind of slides (often containing a dizzy array of graphics) at breath-taking speed.

The research Evans and his team undertake looks at large numbers of papers to determine patterns that identify replicability and whether the increase in the size of research teams and the rise of meta research has any impact. For those interested, published papers include “Centralized “big science” communities more likely generate non-replicable results” and “Large Teams Have Developed Science and Technology; Small Teams Have Disrupted It”.

Evans described some of the consequences when a single mistake is reused and appears in multiple subsequent papers, ‘contaminating’ them. He used an example of the HeLa cell* in relation to drug gene interactions. Misidentified cells resulted in ‘indirect contamination’ of the 32,755 articles based on them, plus the estimated half a million other papers which cited these cells. This can represent a huge cost where millions of dollars’ worth of research has been contaminated by a mistake.

The problem is scientific communities use the same techniques and methods, which reduces the robust nature of research. Increasingly overlapping research networks with exposure to similar methodologies and prior knowledge – research claims are not being independently replicated. Claims that are highly centralised on star scientists, repeat collaborations & overlapping methods are far less robust and lead to huge distortion in the literature. the larger the team, the more likely their output will support and amplify rather than disrupt prior work. if there is an overlap, e.g. between authors or methodologies, there is more likely to be agreement.

Making the analogy of the difference between Slumdog Millionaire vs Marvel movies, Evans noted that independent, decentralised, non-overlapping claims are far more likely to be robust, replicable & of more benefit to society. It is effectively a form of triangulation. Smaller, decentralised communities are more likely to conduct independent experiments to reproduce results, producing more robust results. Small teams reach further into the past and looks to more obscure and independent work. Bigger is not better – smaller teams are more productive, innovative & disruptive because they have more to gain & less to lose than larger teams.

Large overlapping teams increase agglomeration around the same topics. The research landscape is seeing a decrease in small teams, and therefore a decrease in independence. These types of group receive less funding & are ‘more risky’ because they are not part of the centralised network.

Evans described a disruption to the scientific narrative building on what has incrementally happened before is effectively Thomas Kuhn’s The Structure of Scientific Revolutions from the 1960s. But “disruption delays impact” – there is a tendency of research teams to keep building on previous successes (which come with an existing audience) rather than risking disruption and consequent need for new audiences etc. In addition, the size of the team matters, one of their findings has been that each additional person on a team reduces the likelihood of research being disruptive. But disruption requires different funding models -with a taste for risk.

Evans noted that you need small teams simultaneously climbing different hills to find the best solution, rather than everyone trying to climb the same hill.  This analogy was picked up by Catriona MacCallum who noted that publishers are actually all on the big hill which means they are in the same boat and trying to achieve the same end goal (hence the mess we are now in). So how do publishers move across to the disruptive landscape with lots of higher hills?

*The HeLa cell is an immortal cell line used in scientific research. It is the oldest and most commonly used human cell line. It is called HeLa because it came from a woman called Henrietta Lacks.

Sci Hub – harm or good?

The second day opened with a debate about Sci Hub on the question of “Is Sci-Hub is doing more good than harm to scholarly communication?”.

The audience was asked to vote whether they ‘agreed’ or ‘disagreed’ with the statement. In this first vote 60% of the audience disagreed and 40% agreed. Note this could possibly reflect attendance at the conference of publishers as the largest cohort of 51% of the attendees, or alternatively be a reflection of the slightly problematic wording of the question. More than one person observed on Twitter that they would have appreciated a ‘don’t know’ or ‘neither good nor bad’ options.

The debate itself was held between Dr Daniel Himmelstein, Postdoctoral Fellow at the University of Pennsylvania (in the affirmative – that SciHub is doing good) and Justin Spence, Partner and Co-Founder at Publisher Solutions International (in the negative – that SciHub is doing harm). I have it on good authority the debate will be written up separately, so won’t do so here. One observation I noted was – the question did not define to whom or what the ‘harm’ was being done. The argument against appeared focused on harm to the market but the argument for was discussing benefit to society.

The discussion was opened up to the room but the comment that elicited a clap from the audience was from Jennifer Smith at St George’s University in London who asked if Elsevier’s profits are defensible when there are people on fun runs raising money for charities who are not anticipating their fundraising cash is going to publisher shareholders rather than supporting research. The question she asked is: “who is stealing from whom?”.

At the end of the debate the audience was asked to vote again at which point, 55% disagreed and 45% agreed meaning Himmelstein won over 5% of the audience. This seems surprising given that it seems very rare to actually change anyone’s mind.

But is it a book or a journal?

Nisha Doshi spoke about Cambridge Elements – a publication format that straddles the Book and Journal formats. It was interesting to hear about some of the challenges Cambridge University Press has faced. These ranged from practical in terms of which systems to use for production which seem to be very clearly delineated as either journal system or book systems. CUP is using several book systems, plus ISBNs, but also using ScholarOne for peer review for this project. Other issues have been philosophical. Authors and many others continue to ask “is it a journal or a book?”. CUP have encouraged authors to embed audio and video in their Cambridge Elements, but are not seeing much take-up so far which is interesting given the success of Open Book Publishers.

Doshi listed the lessons CUP has learned through the process of trying to get this new publication form off the ground. It was interesting to see how far Cambridge Elements has come. In October 2017 as part of our Open Access Week events, the OSC hosted CUP to talk about what was described at this point as their “hybrid books and journals initiative“.

What’s the time Mr Wolf?

In 2016, Sally Rumsey and I spoke to the library communities at our institutions (Oxford and Cambridge, respectively) with a presentation: “Watch out, it’s behind you: publishers’ tactics and the challenge they pose for librarians”. Our warnings have increasingly been supported with publisher activity in the sector over the past three years. Two presentations at Researcher to Reader were along these lines.

In the first instance, Springer Nature presented on their Data Support Services which are a commercial offering in direct competition to the services offered by Scholarly Communication departments in libraries. I should note here that Elsevier also charge for a similar service through their Mendeley Data platform for institutions.

Representing an even further encroachment, the second presentation by Jean Shipman from Elsevier was about a new initiative which is training librarians to train researchers about data management. The new Elsevier Research Data Management Librarian Academy (RDMLA) has an emphasis on peer to peer teaching. Elsevier developed a needs assessment for RDM training, assessed library competencies, and library education curriculum before developing the RDMLA curriculum for RDM training. Example units include research data culture, marketing the program to administrators, and an overview of tools such as for coding. Elsevier moving into the training/teaching space is not new, they have had the ‘Elsevier Publishing Campus’ and ‘Researcher Academy’ for some time. But those are aimed at the research community. This new initiative is formally stepping directly into the library space.

Empathy mapping as a workshop structure

One of the features of Researcher to Reader is the workshops which are run in several sessions over the two day period. In all there is not much more time available than a traditional 2.5 – 3 hour workshop prior to the main event, but this format means there is more reflection time between sessions and does focus the thinking when you are all together.

I attended a workshop on “Supporting Early-Career Scholarship” asking: How can librarians, technologists and publishers better support early career scholars as they write and publish their work?

Ably facilitated by Bec Evans, Founder at Prolifiko with Dee Watchorn, Product Engagement Manager at De Gruyter and Christine Tulley, the workshop used a process called Empathy Mapping. Participants were given handouts with comments made by early career researchers during interviews about the writing process as part of a research programme by Prolifiko. This helped us map out the experience of ECRs from their perspective rather than guessing and imposing our own biases.

We were asked to come up with a problem – for my group it was “How can we help an ECR disseminate their first paper beyond the publication process?” And we were then asked to find a solution. Our group identified that these people need to understand the narrative of their work that they can then take through blogs, presentations, Twitter and other outlets. Our proposal was to create an online programme that only allowed 5 minutes for recording (in the way Screencastify only allows 10 minutes) an understandable explanation of their research that they can then upload for commentary by peers in a safe space before going public.

And so, to end

It is helpful to have different players together in a room. This is really the only way we can start to understand one another. As an indicator of where we are at, we cannot even agree on a common language for what we do – in a Twitter discussion about how SciHub is meeting an ‘ease of access’ need that has not been met by publishers or libraries, it became clear that while in the library space we talk about the scholarly publishing *ecosystem*, publishers consider libraries to be part of the scholarly publishing *industry*.

One tweet from a publisher was: “Good to hear Christine Tulley talk about why academics write and what it is important to them at #R2RConf . We don’t want to, but publishers too often think generically about authors as they do about content”. While slightly confronting (authors are not only their clients, but also provide the content for *free*, so should perhaps be treated with some respect), it does underline why it is so essential that we get researchers, librarians and publishers into the same room to understand one other better.

All the more reason to attend Researcher to Reader 2020!

Published 4 March 2019
Written by Dr Danny Kingsley
Creative Commons License

Text and data mining services: an update

Text and Data Mining (TDM) is the process of digitally querying large collections of machine-readable material, extracting specific information and, by analysis, discovering new information about a topic.

In February 2017, a group University of Cambridge staff met to discuss “Text and Data Mining Services: What can Cambridge libraries offer?”  It was agreed that a future library Text and Data Mining (TDM) support service could include:

  • Access to data from our own collections
  • Advice on legal issues, what publishers allow, what data sets and tools are available
  • Registers on data provided for mining and TDM projects
  • Fostering agreements with publishers.

This blog reports on some of the activities, events and initiatives, involving libraries at the University of Cambridge, that have taken place or are in progress since this meeting (also summarised in these slides).  Raising awareness, educating, and teasing out the issues around the low uptake of this research process have been the main drivers for these activities.

March 2017: RLUK 2017 Conference Workshop

The Office of Scholarly Communication (OSC) and Jisc ran a workshop at the Research Libraries UK 2017 conference to discuss Research Libraries and TDM.  Issues raised included licencing, copyright, data management, perceived lack of demand, where to go for advice within an institution or publisher, policy and procedural development for handling TDM-related requests (and scaling this up across an institution) and the risk of lock-out from publishers’ content, as well as the time it can take for a TDM contract to be finalised between an institution and publisher.  The group concluded that it is important to build mechanisms into TDM-specific licencing agreements between institutions and publishers where certain behaviours are expected.  For example, if suspicious activity is detected by a publisher’s website, it would be better not to automatically block the originating institution from accessing content, but investigate this first (although this may depend on systems in place), or if lock-out happens and the activity is legal, participants suggested that institutions should explore compensation for the time that access is lost if significant.

July 2017: University of Cambridge Text and Data Mining Libguide

Developed by the eResources Team, this LibGuide explains about Text and Data Mining (TDM): what it is, what the legal issues are, what you can do and what you should not try to do. It also provides a list of online journals under license for TDM at the University of Cambridge and a list of digital archives for text mining that can be supplied to the University researchers on a disc copy. Any questions our researchers may have about a TDM project, not answered through the LibGuide, can be submitted to the eResources Team via an enquiry form.

July 2017: TDM Symposium

The OSC hosted this symposium to provide as much information as possible to the attendees regarding TDM.  Internal and external speakers, experienced in the field, spoke about what TDM is and what the issues are; research projects in which TDM was used; TDM tools; how a particular publisher supports TDM; and how librarians can support TDM.

At the end of the day a whole-group discussion drew out issues around why more TDM is not happening in the UK and it was agreed that there was a need for more visibility on what TDM looks like (e.g. a need for some hands-on sessions) and increased stakeholder communication: i.e. between publishers, librarians and researchers.

November 2017: Stakeholder communication and the TDM Test Kitchen

This pilot project involves a publisher, librarians and researchers. It is providing practical insight into the issues arising for each of the stakeholders: e.g. researchers providing training on TDM methods and analysis tools, library support managing content accessibility and funding for this, and content licencing and agreements for the publisher. We’ll take a more in-depth look at this pilot in an upcoming blog on TDM – watch this space.

January 2018: Cambridge University Library Deputy Director visits Yale

The Yale University Library Digital Humanities Laboratory provides physical space, resources and a community within the Library for Yale researchers who are working with digital methods for humanities research and teaching. In January this year Dr Danny Kingsley visited the facility to discuss approaches to providing TDM services to help planning here. The Yale DH Lab staff help out with projects in a variety of ways, one example being to help researchers get to grips with digital tools and methods.  Researchers wanting to carry out TDM on particular collections can visit the lab to do their TDM: off-line discs containing published material for mining can be used in-situ. In 2018, the libraries at Cambridge have begun building up a collection of offline discs of specific collections for the same purpose.

June 2018: Text and Data Mining online course

The OSC collaborated with the EU OpenMinTeD project on this Foster online course: Introduction to Text and Data Mining.  The course helps a learner understand the key concepts around TDM, explores how Research Support staff can help with TDM and there are some practical activities that even allow those with non-technical skills try out some mining concepts for themselves.  By following these activities, you can find out a bit more about sentence segmentation, tokenization, stemming and other processing techniques.

October 2018: Gale Digital Scholar Lab

The University of Cambridge has trial access to this platform until the end of December: it provides TDM tools at a front end to digital archives from Gale Cengage.  You can find out more about this trial in this ejournals@cambridge blog.

In summary…

Following the initial meeting to discuss research support services for TDM, there have been efforts and achievements to raise awareness of TDM and the possibilities it can bring to the research process as well as to explore the issues around the low usage of TDM in the research community at large.  This is an on-going task, with the goal of increased researcher engagement with TDM.

Published 23 October 2018
Written by Dr Debbie Hansen
Creative Commons License

Libraries’ role in teaching the research community – LILAC2017

Recently Claire Sewell, the OSC Research Support Skills Coordinator attended her first LILAC conference in Swansea. These are her observations from the event.

LILAC (Librarians’ Information Literacy Annual Conference) is one of the highlights of the information profession calendar which focuses on sharing knowledge and best practice in the field of information literacy. For those who don’t know information literacy is defined as:

Knowing when and why you need information, where to find it and how to evaluate, use and communicate it in an ethical manner (CILIP definition)

Showcasing OSC initiatives

Since it was my first time attending it was a privilege to be able to present three sessions on different aspects of the work done in the OSC. The first session I ran was an interactive workshop on teaching research data management using a modular approach. The advantage of this is that the team can have several modules ready to go using discipline specific examples and information, meaning that we are able to offer courses tailored to the exact needs of the audience. This works well as a teaching method and the response from our audience both in Cambridge and at LILAC was positive.

There was an equally enthusiastic response to my poster outlining the Supporting Researchers in the 21st Century programme. This open and inclusive programme aims to educate library staff in the area of scholarly communication and research support. One element of this programme was the subject of my finalLILAC contribution – a short talk on the Research Support Ambassador Programme which provides participants with a chance to develop a deeper understanding of the scholarly communication process.

As well as presenting and getting feedback on our initiatives the conference provided me with a chance to hear about best practice from a range of inspiring speakers. A few of my highlights are detailed below.

Getting the message out there -keynote highlights

Work openly, share ideas and get out of the library into the research community were the messages that came out of the three keynote talks from across the information world.

The first was delivered by Josie Fraser, a Social and Educational Technologist who has worked in a variety of sectors, who spoke on the topic of The Library is Open: Librarians and Information Professionals as Open Practitioners.  Given the aim of the OSC to promote open research and work in a transparent manner this was an inspiring message.

Josie highlighted the difference between the terms free and open, words which are often confused when it comes to educational resources.  If a resource is free it may well be available to use but this does not mean users are able to keep copies or change them, something which is fundamental for education.

Open implies that a resource is in the public domain and can be used and reused to build new knowledge. Josie finished her keynote by calling for librarians to embrace open practices with our teaching materials. Sharing our work with others helps to improve practice and saves us from reinventing the wheel. The criteria for open are: retain, reuse, revised, remix, redistribute.

In her keynote, Making an Impact Beyond the Library and Information Service, Barbara Allen talked about the importance of moving outside the library building and into the heart of the university as a way to get information literacy embedded within education rather than as an added extra. The more we think outside the library the more we can link up with other groups who operate outside the library, she argued. Don’t ask permission to join in the bigger agenda – just  join in or you might never get there.

Alan Carbery in his talk Authentic Information Literacy in an Era of Post Truth  discussed authentic assessment of information literacy. He described looking at anonymised student coursework to assess how students are applying what they have learnt through instruction. When real grades are at stake students will usually follow orders and do what is asked of them.

Students are often taught about the difference between scholarly and popular publications which ignores the fact that they can be both. Alan said we need to stop polarising opinions, including the student concept of credibility, when they are taught that some sources are good and some are bad. This concept is becoming linked to how well-known the source is – ‘if you know about it it must be good’. But this is not always the case.

Alan asked: How can we get out of the filter bubble – social media allows you to select your own news sources but what gets left out? Is there another opinion you should be exposed to? He gave the example of the US elections where polls and articles on some news feeds claimed Clinton was the frontrunner right up until the day of the election. We need to move to question-centric teaching and teach students to ask more questions of the information they receive.

Alan suggested we need to embed information literacy instruction in daily life – make it relevant for attendees. There are also lessons to be learnt here which can apply to other areas of teaching. We need to become information literacy instructors as opposed to library-centric information literacy instructors.

Key points from other sessions

There is a CILIP course coming soon on ‘Copyright education for librarians’. This will be thinking about the needs of the audience and relate to real life situations. New professional librarians surveyed said that copyright was not covered in enough depth during their courses however many saw it as an opportunity for future professional development. The majority of UK universities have a copyright specialist of some description, but copyright is often seen as a problem to be avoided by librarians.

There is a movement in teaching to more interactive sessions rather than just talking and working on their own. Several sessions highlighted the increased pressure on and expectations of students in academia. Also highlighted were the benefits of reflective teaching practice.

There are many misconceptions about open science and open research amongst the research community. There is too much terminology and it is hard to balance the pressure to publish with the pressure to good research. Librarians have a role in helping to educate here. Many early career researchers are positive about data sharing but unsure as to how to go about it, and one possibility is making course a formal part of PhD education.

Claire Sewell attended the LILAC conference thanks to the support of the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

Published 27  April 2017
Written by Claire Sewell 

Creative Commons License

Service Level Agreements for TDM

Librarians expect publishers to support our researchers’ rights to Text and Data Mining and not cut access off for a library if they see ‘suspicious’ activity before they establish whether it is legitimate or not. These were the conclusions of a group who met at a workshop to discuss provision of Text and Data Mining services in March. The final conclusions were:

Expectations libraries have of publishers over TDM

The workshop concluded with very different expectations to what was originally proposed. The messages to publishers that were agreed were:

  1. Don’t cut us off over TDM activity! Have a conversation with us first if you notice abnormal behaviour*
  2. If you do cut us off and it turns out to be legitimate then we expect compensation for the time we were cut off
  3. Mechanisms for TDM where certain behaviours are expected need to be built into separate licensing agreements for TDM

*And if you want to cut us off – please demonstrate there are all these illegal TDM activities happening in the UK

Workshop on TDM

The workshop “Developing a research library position statement on Text and Data Mining in the UK” was part of the recent RLUK2017 conference.  My colleagues, Dr Debbie Hansen from the Office of Scholarly Communication and Anna Vernon from Jisc, and I wanted to open up the discussion about Text and Data Mining (TDM) with our library community. We have made the slides available and they contain a summary of all the discussions held during the event. This short blog post is an analysis of that discussion.

We started the workshop with a quick analysis of who was in the room using a live survey tool called Mentimeter. Eleven participants came from research institutions – six large, four small and one  from an ‘other research institution’. There were two publishers, and four people who identified as ‘other’ – which were intermediaries. Of the 19 attendees, 14 worked in a library. There was only one person who said they had extensive experience in TDM, four people said they were TDM practitioners but the largest group were the 14 who classified themselves as having ‘heard of TDM but have had no practical experience’.

The workshop then covered what TDM is, what the legal situation is and what publishers are currently saying about TDM . We then opened up the discussion.

Experiences of TDM for participants

In the initial discussion about experiences of the participants, a few issues were raised if libraries were to offer TDM services. Indeed there was a question whether this should form part of library service delivery at all. The issue is partly that this is new legislation, so currently publisher and institutions are reactive, not strategic in relation to TDM. We agreed:

  • There is a need for clearer understanding of the licensing situation with information
  • We also need to create a mechanism of where to go for advice, both within the institution and the publisher
  • We need to develop procedures of what to do with requests – which is a policy issue 
  • Researcher behaviour is a factor – academics are not concerned by copyright.

Offering TDM is a change of role of the library – traditionally libraries have existed to preserve access to items. The group agreed we would like to be enabling this activity rather than saying “no you can’t”. There are library implications for offering support for TDM, not least that librarians are not always aware of TDM taking place within their institution. This makes it difficult to be the central point for the activity. In addition, TDM could threaten access through being cut off, so this is causing internal disquiet.

TDM activity underway in Europe & UK

We then presented to the workshop some of the activities in TDM that are happening internationally, such as the FutureTDM project. There was also a short run down on the new copyright exception for research organisations carrying out research in public interest being proposed to the European Commission allowing researchers to carry out TDM of copyright protected content if they have lawful access (e.g. subscription) without prior authorisation.

ContentMine is a not for profit organisation that supplies open source TDM software to access and analyse documents. They are currently partnering with Wikimedia Foundation with a grant to develop WikiFactMine which is a project aiming to make scientific data available to editors of Wikidata and Wikipedia.

The ChemDataExtractor is a tool built by the Molecular Engineering Group at the University of Cambridge. It is an open source software package that extracts chemical information from scientific documentation (e.g. text, tables). The extracted data can be used for onward analysis. There is some information in a paper  in the Journal of Chemical Information and Modelling: ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature“.

The Manchester Institute of Biotechnology hosts the National Centre for Text Mining (NaCTeM), which works with research partners to provide text mining tools and services in the biomedical field.

The British Library had a call for applications for a PhD student placement to undertake thesis text mining on 150,000 theses held in EThOS to extract new metadata such as names of supervisors.  Applications closed 20 February 2017, but according to an EThOS newsletter from March,  they had received no applications for the placement. The suggestion is that “perhaps that few students have content mining skills sufficiently well developed to undertake such a challenging placement”.

The problem with supporting TDM in libraries

We proposed to the workshop group that libraries are worried about getting cut off from their subscription by publishers due to large downloads of papers through TDM activity. This is because publishers’ systems are pre-programmed to react to suspicious activity. If TDM invokes automated investigation, then this may cause an access block.

However universities need to maintain support mechanism to ensure continuity of access. For this to occur we require workflows for swift resolution, fast communication and a team of communicators. This also requires education of researchers of potential issues.

We asked the group to discuss this issue – noting reasons why their organisation is not actively supporting TDM and if they are the main challenges they face.

Discussion about supporting TDM in libraries

The reasons put forward for not supporting TDM included practical issues such as the challenges of handling physical media and the risk of lockout.

The point was made that there was a lack of demand for the service. This is possibly because the researchers are not coming to the Library for help. There may be a lack of awareness in the IT areas that the Library can help and they may not even pass on the queries.  This points to the need for internal discussion with institutions.

It was noted that there was an assumption in the discussion that the Library is at the centre of this type of activity, however and we are not joined up as organisations. The question is who is responsible for this activity? There is often no institutional view on TDM because the issues are not raised at academic level. Policy is required.

Even if researchers do come to the library, there are questions about how we can provide a service. Initially we would be responding to individual queries, but how do we scale it up?

The challenges raised included the need for libraries to ensure everyone understands the needs at the the content owner level. The library, as the coordinator of this work would need to ensure the TDM is not for commercial use, and need to ensure people know their responsibilities. This means the library is potentially being intrusive on the researcher process.

Service Level Agreement proposal

The proposal we put forward to the group was that we draft a statement for a Service Level Agreement for publishers to assure us that if the library is cut off, but the activity is legal, we will be reinstated within and agreed period of time. We asked the group to discuss the issues if we were to do this.

Expectation of publishers

The discussion has raised several issues libraries had experienced with publishers over TDM. One participants said the contract with a particular publisher to allow their researchers to do TDM took two years to finalise.

There was a recognition that for genuine TDM to be identified might require some sort of registry of TDM activity which might not be an administrative task all libraries want to take on. The alternative suggestion was a third party IP registry, which could avoid some of the manual work. Given that LOCKSS crawls publisher software without getting trapped, this could work in the same way with a bank of IP addresses that is secured for this purpose.

Some solutions that publishers could help with include publishers delivering material in different ways – not on a hard drive. The suggestion was that this could be part of a platform and the material was produced in a format that allowed TDM (at no extra cost).

Expectation of libraries

There was some distaste amongst the group for libraries to take on the responsibility for maintaining  a TDM activity register. However libraries could create a safe space for TDM like virtual private networks.

Licenses are the responsibility of libraries, so we are involved whether we wish to be or not. Large scale computational reading is completely different from current library provision. There are concerns that licensing via the library could be unsuitable for some institutions. This raises issues of delivery and legal responsibilities. One solution for TDM could be to record IP address ranges in licence agreements. We need to consider:

  • How do we manage the licenses we are currently signed up to?
  • How do we manage licensing into the future so we separate different uses? Should we have a separate TDM ‘bolt on’ agreement.

The Service Level Agreement (SLA) solution

The group noted that, particularly given the amount publisher licenses cost libraries, being cut off for a week or two weeks with no redress is unusual at best in a commercial environment. At minimum publishers should contact the library to give the library a grace period to investigate rather than being cut off automatically.

The basis for the conversation over the SLA includes the fact that the law is on the subscriber’s side if everyone is doing it legally. It would help to have an understanding of the extent of infringing activity going on with University networks (considering that people can ‘mask’ themselves). This would be useful for thinking of thresholds.

Next steps

We need to open up the conversation to a wider group of librarians. We are hoping that we might be able to work with RLUK and funding councils to come to an agreed set of requirements that we can have endorsed by the community and which we can then take to to publishers.

Debbie Hansen and Danny Kingsley attended the RLUK conference thanks to the support of the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

Published 30 March 2017
Written by Dr Danny Kingsley
Creative Commons License

Where did they come from? Educational background of people in scholarly communication

Scholarly communication roles are becoming more commonplace in academic libraries around the world but who is actually filling these roles? The Office of Scholarly Communication in Cambridge recently conducted a survey to find out a bit more about who makes up the scholarly communication workforce and this blog post is the first in a series sharing the results.

The survey was advertised in October 2016 via several mailing lists targeting an audience of library staff who worked in scholarly communication. For the purposes of the survey we defined this as:

The process by which academics, scholars and researchers share and publish their research findings with the wider academic community and beyond. This includes, but is not limited to, areas such as open access and open data, copyright, institutional repositories and research data management.

In total 540 people responded to the calls for participation with 519 going on to complete the survey, indicating that the topic had relevance for many in the sector.

Working patterns

Results show that 65% of current roles in scholarly communication have been established in respondent’s organisations for less than five years with fewer than 15% having been established for more than ten years. Given that scholarly communication is still growing as a discipline this is perhaps not a surprising result.

It should also be noted that the survey makes no distinction between those who are working in a dedicated scholarly communication role and those who may have had additional responsibilities added to a pre-existing position. These roles tend to sit within larger organisations which employ over 200 people although whether the organisation was defined as the library or wider institution was open to interpretation by respondents.

Responses showed an even spread of experience in the library and information science (LIS) sector with 22% having less than five years’ experience and 27% having more than twenty.  Since completing their education just over half of respondents have remained within LIS but given the current fluctuations in the job market it is not surprising to learn that just under half of people have worked outside the sector within the same period.

Respondents were also asked to list the ways in which they actively contributed to the scholarly publication process. The majority (72%) did so by authoring scholarly works or contributing to the peer review process (44%). Although not specified as a category a number of respondents highlighted their work in publishing material, indicating a change in the scholarly process rather than a continuation to the status quo.

LIS qualifications

Most of those (71%) who responded to the survey either have or are currently working towards a postgraduate qualification in LIS, an anticipated result given the target population of the survey. The length of time respondents had held their qualification was evenly spread in line with the amount of time spent working in the sector with 48% having achieved their qualification less than ten years ago whilst 49% having held their qualification for over a decade. Just over half of this group felt that their LIS qualification did not equip them with knowledge of the scholarly communication process (56%).

Around a fifth of respondents (21%) hold a library and information science qualification at a level other than postgraduate, with the majority of being at bachelor level. Of these there was a fairly even divide between those who have held this qualification for five to ten years (31%) and those who qualified more than twenty years ago (28%). Only 17% of this group felt that their studies equipped them with appropriate knowledge of scholarly communication.

Qualifications outside LIS

A small number of respondents do not hold qualifications in LIS but hold or are working towards postgraduate qualifications in other subjects. Most of this group hold/are working on a PhD (69%) in a range of subjects from anatomy to mechanical engineering.

This group overwhelmingly felt that what they learnt during their studies had practical applications in their work in scholarly communication (74%). This was a larger percentage than those who had studied LIS at either undergraduate or postgraduate level. These results echo experiences at Cambridge where a large proportion of the team is made up of people from a variety of academic backgrounds. In many ways this has proven to be an asset as they have direct experience of the issues faced by current researchers and are able to offer insight into how best to meet their needs.

So what does this tell us?

The scholarly communication workforce is expanding as academic libraries respond to the changing environment and shift their focus to research support. Many of these roles have been created in the past five years in particular within larger organisations better positioned to devote resources to increasing their scholarly communication presence.

Although results from this survey indicate that the majority of staff come from a library background a diverse range of levels and subjects are represented. As noted above this can provide unique insights into researcher needs but it also raises the question of what trained library professionals can bring to this area. Given that the majority of those educated in LIS felt that their qualification did not adequately equip them for their role this is a potentially worrying trend which needs to be explored further.

We will be continuing to analyse the results of the survey over the next few months to address both this and other questions. Hopefully this will provide insight into where scholarly communications librarians are now and what they can do to ensure success into the future.

Published 9 March 2017
Written by Claire Sewell
Creative Commons License

‘Be nice to each other’ – the second Researcher to Reader conference

Aaaaaaaaaaargh! was Mark Carden’s summary of the second annual Researcher to Reader conference, along with a plea that the different players show respect to one another. My take home messages were slightly different:

  • Publishers should embrace values of researchers & librarians and become more open, collaborative, experimental and disinterested.
  • Academic leaders and institutions should do their bit in combating the metrics focus.
  • Big Deals don’t save libraries money, what helps them is the ability to cancel journals.
  • The green OA = subscription cancellations is only viable in a utopian, almost fully green world.
  • There are serious issues in the supply chain of getting books to readers.
  • And copyright arrangements in academia do not help scholarship or protect authors*.

The programme for the conference included a mix of presentations, debates and workshops. The Twitter hashtag is #r2rconf.

As is inevitable in the current climate, particularly at a conference where there were quite a few Americans, the shadow of Trump was cast over the proceedings. There was much mention of the political upheaval and the place research and science has in this.

[*please see Kent Anderson’s comment at the bottom of this blog]

In the publishing corner

Time for publishers to raise to the challenge

The conference opened with an impassioned speech by Mark Allin, the President and CEO of John Wiley & Sons, who started with the statement this was “not a time for retreat, but a time for outreach and collaboration and to be bold”.

The talk was not what was expected from a large commercial publisher. Allin asked: “How can publishers act as advocates for truth and knowledge in the current political climate?” He mentioned that Proquest has launched a displaced researchers programme in reaction to world events, saying, “it’s a start but we can play a bigger role”.

Allin asked what publishers can do to ensure research is being accessed. Referencing “The content trap” by Bharat Anand, Allin said “We won’t as a media industry survive as a media content and putting it in a bottle and controlling its distribution. We will only succeed if we connect the users. So we need to re-engineer the workflows making them seamless, frictionless. “We should be making sure that … we are offering access to all those who want it.”

Allin raised the issue of access, noting that ResearchGate has more usage than any single publisher. He made the point that “customers don’t care if it is the version of record, and don’t care about our arcane copyright laws”. This is why people use SciHub, it is ease of access. He said publishers should not give up protecting copyright but must realise its limitations and provide easy access.

Researchers are the centre of gravity – we need to help them spend more time researching and less time publishing, he says. There is a lesson here, he noted, suppliers should use “the divine discontent of the customer as their north star”. He used the example of Amazon to suggest people working in scholarly communication need to use technology much better to connect up. “We need to experiment more, do more, fail more, be more interconnected” he said, where “publishing needs open source and open standards” which are required for transformational impact on scholarly publishing – “the Uber equivalent”.

His suggestion for addressing the challenges of these sharing platforms is to “try and make your experience better than downloading from a pirate site”, and that this would be a better response than taking the legal route and issuing takedown notices.  He asked: “Should we give up? No, but we need to recognise there are limits. We need to do more to enable access.”

Allin called the situation, saying publishing may have gone online but how much has the internet really changed scholarly communication practices? The page is still a unit of publishing, even in digital workflows. It shouldn’t be, we should have a ‘digital first’ workflow. The question isn’t ‘what should the workflow look like?’, but ‘why hasn’t it improved?’, he said, noting that innovation is always slowed by social norms not technology. Publishers should embrace values of researchers & librarians and become more open, collaborative, experimental and disinterested.

So what do publishers do?

Publishers “provide quality and stability”, according to Kent Anderson, speaking on the second day (no relation to Rick Anderson) in his presentation about ‘how to cook up better results in communicating research’. Anderson is the CEO of Redlink, a company that provides publishers and libraries with analytic and usage information. He is also the founder of the blog The Scholarly Kitchen.

Anderson made the argument that “publishing is more than pushing a button”, by expanding on his blog on ‘96 things publishers do’. This talk differed from Allin’s because it focused on the contribution of publishers.

Anderson talked about the peer review process, noting that rejections help academics because usually they are about mismatch. He said that articles do better in the second journal they’re submitted to.

During a discussion about submission fees, Anderson noted that these “can cover the costs of peer review of rejected papers but authors hate them because they see peer review as free”. His comment that a $250 journal submission charge with one journal is justified by the fact that the target market (orthopaedic surgeons) ‘are rich’ received (rather unsurprisingly) some response from the audience via Twitter.

Anderson also made the accusation that open access publishers take lower quality articles when money gets tight. This did cause something of a backlash on the Twitter discussion with a request for a citation for this statement, a request for examples of publishers lowering standards to bring in more APC income with the exception of scam publishers. [ADDENDUM: Kent Anderson below says that this was not an ‘accusation’ but an ‘observation’. The Twitter challenge for ‘citation please?’ holds.]

There were a couple of good points made by Anderson. He argued that one of the value adds that publishers do is training editors. This is supported by a small survey we undertook with the research community at Cambridge last year which revealed that 30% of the editors who responded felt they needed more training.

The library corner

The green threat

There is good reason to expect that green OA will make people and libraries cancel their subscriptions, at least it will in the utopian future described by Rick Anderson (no relation to Kent Anderson), Associate Dean of University of Utah in his talk “The Forbidden Forecast, Thinking about open access and library subscriptions”.

Anderson started by asking why, if we’re in a library funding crisis, aren’t we seeing sustained levels of unsubscription? He then explained that Big Deals don’t save libraries money. They lower the cost per article, but this is a value measure, not a cost measure. What the Big Deal did was make cancellations more difficult. Most libraries have cancelled every journal that they can without Faculty ‘burning down the library’, to preserve the Big Deal. This explains the persistence of subscriptions over time. The library is forced to redirect money away from other resources (books) and into serials budget. The reason we can get away with this is because books are not used much.

The wolf seems to be well and truly upon us. There have been lots of cancellations and reduction of library budgets in the USA (a claim supported by a long list of examples). The number of cancellations grows as the money being siphoned off book budgets runs out.

Anderson noted that the emergence of new gold OA journals doesn’t help libraries, this does nothing to relieve the journal emergency. They just add to the list of costs because it is a unique set of content. What does help libraries is the ability to cancel journals. Professor Syun Tutiya, Librarian Emeritus at Chiba University in a separate session noted that if Japan were to flip from a fully subscription model to APCs it would be about the same cost, so that would solve the problem.

Anderson said that there is an argument that “there is no evidence that green OA cancels journals” (I should note that I am well and truly in this camp, see my argument). Anderson’s argument that this is saying the future hasn’t happened yet. The implicit argument here is that because green OA has not caused cancellations so far means it won’t do it into the future.

Library money is taxpayers’ money – it is not always going to flow. There is much greater scrutiny of journal big deals as budgets shrink.

Anderson argued that green open access provides inconsistent and delayed access to copies which aren’t always the version of record, and this has protected subscriptions. He noted that Green OA is dependent on subscription journals, which is “ironic given that it also undermines them”. You can’t make something completely & freely available without undermining the commercial model for that thing, Anderson argued.

So, Anderson said, given green OA exists and has for years, and has not had any impact on subscriptions, what would need to happen for this to occur? Anderson then described two subscription scenarios. The low cancellation scenario (which is the current situation) where green open access is provided sporadically and unreliably. In this situation, access is delayed by a year or so, and the versions available for free are somewhat inferior.

The high cancellation scenario is where there is high uptake of green OA because there are funder requirements and the version is close to the final one. Anderson argued that the “OA advocates” prefer this scenario and they “have not thought through the process”. If the cost is low enough of finding which journals have OA versions and the free versions are good enough, he said, subscriptions will be cancelled. The black and white version of Anderson’s future is: “If green OA works then subscriptions fail, and the reverse is true”.

Not surprisingly I disagreed with Anderson’s argument, based on several points. To start, there would need to have a certain percentage of the work available before a subscription could be cancelled. Professor Syun Tutiya, Librarian Emeritus at Chiba University noted in a different discussion that in Japan only 6.9% of material is available Green OA in repositories and argued that institutional repositories are good for lots of things but not OA. Certainly in the UK, with the strongest open access policies in the world, we are not capturing anything like the full output. And the UK is itself only 6% of the research output for the world, so we are certainly a very long way away from this scenario.

In addition, according to work undertaken by Michael Jubb in 2015 – most of the green Open Access material is available in places other than institutional repositories, such as ResearchGate and SciHub. Do librarians really feel comfortable cancelling subscriptions on the basis of something being available in a proprietary or illegal format?

The researcher perspective

Stephen Curry, Professor of Structural Biology, Imperial College London, spoke about “Zen and the Art of Research Assessment”. He started by asking why people become researchers and gave several reasons: to understand the world, change the world, earn a living and be remembered. He then asked how they do it. The answer is to publish in high impact journals and bring in grant money. But this means it is easy to lose sight of the original motivations, which are easier to achieve if we are in an open world.

In discussing the report published in 2015, which looked into the assessment of research, “The Metric Tide“, Curry noted that metrics & league tables aren’t without value. They do help to rank football teams, for example. But university league tables are less useful because they aggregate many things so are too crude, even though they incorporate valuable information.

Are we as smart as we think we are, he asked, if we subject ourselves to such crude metrics of achievement? The limitations of research metrics have been talked about a lot but they need to be better known. Often they are too precise. For example was Caltech really better than University of Oxford last year but worse this year?

But numbers can be seductive. Researchers want to focus on research without pressure from metrics, however many Early Career Researchers and PhD students are increasingly fretting about publications hierarchy. Curry asked “On your death bed will you be worrying about your H-Index?”

There is a greater pressure to publish rather than pressure to do good science. We should all take responsibility to change this culture. Assessing research based on outputs is creating perverse incentives. It’s the content of each paper that matters, not the name of the journal.

In terms of solutions, Curry suggested it would be better to put higher education institutions in 5% brackets rather than ranking them 1-n in the league tables. Curry calls for academic leaders and institutions to do their bit in combating the metrics focus. He also called for much wider adoption of the Declaration On Research Assessment (known as DORA). Curry’s own institution, Imperial College London, has done so recently.

Curry argued that ‘indicators’ would be a more appropriate term than ‘metrics’ in research assessment because we’re looking at proxies. The term metrics imply you know what you are measuring. Certainly metrics can inform but they cannot replace judgement. Users and providers must be transparent.

Another solution is preprints, which shift attention from container to content because readers use the abstract not the journal name to decide which papers to read. Note that this idea is starting to become more mainstream with the research by the NIH towards the end of last year “Including Preprints and Interim Research Products in NIH Applications and Reports

Copyright discussion

I sat on a panel to discuss copyright with a funder – Mark Thorley, Head of Science Information, Natural Environment Research Council , a lawyer – Alexander Ross, Partner, Wiggin LLP and a publisher – Dr Robert Harington,  Associate Executive Director, American Mathematical Society.

My argument** was that selling or giving the copyright to a third party with a purely commercial interest and that did not contribute to the creation of the work does not protect originators. That was the case in the Kookaburra song example. It is also the case in academic publishing. The copyright transfer form/publisher agreement that authors sign usually mean that the authors retain their moral rights to be named as the authors of the work, but they sign away rights to make any money out of them.

I argued that publishers don’t need to hold the copyright to ensure commercial viability. They just need first exclusive publishing rights. We really need to sit down and look at how copyright is being used in the academic sphere – who does it protect? Not the originators of the work.

Judging by the mood in the room, the debate could have gone on for considerably longer. There is still a lot of meat on that bone. (**See the end of this blog for details of my argument).

The intermediary corner

The problem of getting books to readers

There are serious issues in the supply chain of getting books to readers, according to Dr Michael Jubb, Independent Consultant and Richard Fisher from Something Understood Scholarly Communication.

The problems are multi-pronged. For a start, discoverability of books is “disastrous” due to completely different metadata standards in the supply chain. ONIX is used for retail trade and MARC is standard for libraries, Neither has detailed information for authors, information about the contents of chapters, sections etc, or information about reviews and comments.

There are also a multitude of channels for getting books to libraries. There has been involvement in the past few years of several different kinds of intermediaries – metadata suppliers, sales agents, wholesalers, aggregators, distributors etc – who are holding digital versions of books that can be supplied through the different type of book platforms. Libraries have some titles on multiple platforms but others only available on one platform.

There are also huge challenges around discoverability and the e-commerce systems, which is “too bitty”. The most important change that has happened in books has been Amazon, however publisher e-commerce “has a long way to go before it is anything like as good as Amazon”.

Fisher also reminded the group that there are far more books published each year than there are journals – it’s a more complex world. He noted that about 215 [NOTE: amended from original 250 in response to Richard Fisher’s comment below] different imprints were used by British historians in the last REF. Many of these publishers are very small with very small margins.

Jubb and Fisher both emphasised readers’ strong preference for print, which implies that much more work needed on ebook user experience. There are ‘huge tensions’ between reader preference (print) and the drive for e-book acquisition models at libraries.

The situation is probably best summed up in the statement that “no-one in the industry has a good handle on what works best”.

Providing efficient access management

Current access control is not functional in the world we live in today. If you ask users to jump through hoops to get access off campus then your whole system defeats its purpose. That was the central argument of Tasha Mellins-Cohen, the Director of Product Development, HighWire Press when she spoke about the need to improve access control.

Mellins-Cohen started with the comment “You have one identity but lots of identifiers”, and noted if you have multiple institutional affiliations this causes problems. She described the process needed for giving access to an article from a library in terms of authentication – which, as an aside, clearly shows why researchers often prefer to use Sci Hub.

She described an initiative called CASA – Campus Activated Subscriber-Access which records devices that have access on campus through authenticated IP ranges and then allows access off campus on the same device without using a proxy. This is designed to use more modern authentication. There will be “more information coming out about CASA in the next few months”.

Mellins-Cohen noted that tagging something as ‘free’ in the metadata improves Google indexing – publishers need to do more of this at article level. This comment was responded with a call out to publishers to make the information about sharing more accessible to authors through How Can I Share It?

Mellins-Cohen expressed some concern that some of the ideas coming out of RA21 Resource Access in 21st Century, an STM project to explore alternatives to IP authentication, will raise barriers to access for researchers.

Summary

It is always interesting to have the mix of publishers, intermediaries, librarians and others in the scholarly communication supply chain together at a conference such as this. It is rare to have the conversations between different stakeholders across the divide. In his summary of the event, Mark Carden noted the tension in the scholarly communication world, saying that we do need a lively debate but also need to show respect for one another.

So while the keynote started promisingly, and said all the things we would like to hear from the publishing industry, there is still the reality that we are not there yet.  And this underlines the whole problem. This interweb thingy didn’t happen last week. What has actually happened  to update the publishing industry in the last 20 years? Very little it seems. However it is not all bad news. Things to watch out for in the near future include plans for micro-payments for individual access to articles, according to Mark Allin, and the highly promising Campus Activated Subscriber-Access system.

Danny Kingsley attended the Researcher to Reader conference thanks to the support of the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

Published 27 February 2017
Written by Dr Danny Kingsley
Creative Commons License

Copyright case study

In my presentation, I spoke about the children’s campfire song, “Kookaburra sits in the old gum tree” which was written by Melbourne schoolteacher Marion Sinclair in 1932 and first aired in public two years later as part of a Girl Guides jamboree in Frankston. Sinclair had to get prompted to go to APRA (Australasian Performing Right Association) to register the song. That was in 1975, the song had already been around for 40 years but she never expressed any great interest in any propriety to the song.

In 1981 the Men at Work song “Down Under” made No. 1 in Australia. The song then topped the UK, Canada, Ireland, Denmark and New Zealand charts in 1982 and hit No.1 in the US in January 1983. It sold two million copies in the US alone.  When Australia won the America’s Cup in 1983 Down Under was played constantly. It seems extremely unlikely that Marion Sinclair did not hear this song. (At the conference, three people self-identified as never having heard the song when a sample of the song was played.)

Marion Sinclair died in 1988, the song went to her estate and Norman Lurie, managing director of Larrikin Music Publishing, bought the publishing rights from her estate in 1990 for just $6100. He started tracking down all the chart music that had been printed all over the world, because Kookaburra had been used in books for people learning flute and recorder.

In 2007 TV show Spicks and Specks had a children’s music themed episode where the group were played “Down Under” and asked which Australian nursery rhyme the flute riff was based on. Eventually they picked Kookaburra, all apparently genuinely surprised when the link between the songs was pointed out. There is a comparison between the music pieces.

Two years later Larrikin Music filed a lawsuit, initially wanting 60% of Down Under’s profits. In February 2010, Men at Work appealed, and eventually lost. The judge ordered Men at Work’s recording company, EMI Songs Australia, and songwriters Colin Hay and Ron Strykert to pay 5% of royalties earned from the song since 2002 and from its future earnings.

In the end, Larrikin won around $100,000, although legal fees on both sides have been estimated to be upwards $4.5 million, with royalties for the song frozen during the case.

Gregory Ham was the flautist in the band who played the riff. He did not write Down Under, and was devastated by the high profile court case and his role in proceedings. He reportedly fell back into alcohol abuse and was quoted as saying: “I’m terribly disappointed that’s the way I’m going to be remembered — for copying something.” Ham died of a heart attack in April 2012 in his Carlton North home, aged 58, with friends saying the lawsuit was haunting him.

This case, I argued, exemplifies everything that is wrong with copyright.

Consider yourself disrupted – notes from RLUK2016

The 2016 Research Libraries UK conference was held at the British Library from 9-11 March on the theme of disruptive innovation. This blog pulls out some of the highlights personally gained from the conference:

  • If librarians are to be considered important – we as a community need to be strong in our grasp of understanding scholarly communication issues
  • We need to know the facts about our subscriptions to, usage of and contributions to scholarly publishing
  • We need high level support in institutions to back libraries in advocacy and negotiation with publishers
  • Scientists are rarely rewarded for being right, so the scientific record is being distorted by the scientific ecosystem
  • Society needs more open research to ensure reproducibility and robust research
  • The library of the future will have to be exponentially more customisable than the current offering
  • The information seeking behaviour of researchers is iterative and messy and does not match library search services
  • Libraries need to ‘create change to triumph’ – to be inventors rather than imitators
  • Management of open access issues need to be shared across institutions with positive outcomes when research offices and libraries collaborate.

I should note this is not a comprehensive overview of the conference, and I have blogged separately about my own contribution ‘The value of embracing unknown unknowns’. Some talks were looking at the broader picture, others specifically at library practice.

Stand your ground – tips for successful publisher negotiations

The opening keynote presentation was by Professor Gerard Meijer, President of Radboud University who conducted the recent Dutch negotiations with Elsevier.

The Dutch position has been articulated by Sander Dekker, the State Secretary  of Education who said while the way forward was gold Open Access, the government would not provide any extra money. Meijer noted this was sensible because every extra cent going into the system goes into the pocket of publishers – something that has been amply demonstrated in the UK.

All universities in the Netherlands are in top 200 universities in the world. This means all research is good quality – so even if it is only 2% of the world output, the Netherlands has some clout.

Meijer gave some salient advice about these types of negotiations. He said this work needs to be undertaken at the highest level at the universities. There are several reasons for this. He noted that 1.5 to 2 percent of university budget goes to subscriptions – and this is growing as budgets are being cut – so senior leadership in institutions should take an active position.

In addition if you are not willing to completely opt out of licencing their material then you can’t negotiate, and if you are going to opt out you will need the support of the researchers. To that end communication is crucial – during their negotiations, they would send a regular newsletter to researchers letting them know how things were going.

Meijer also stressed the importance of knowing the facts, and the need to communicate and inform the researchers about these facts and the numbers. He noted that most researchers don’t know how much subscriptions cost. They do know however about article processing charges – creating a misconception that Open Access is more expensive.

Institutions in the Netherlands spent €9.2 billion million on Elsevier publications in 2009, which rose to €11billion million* in 2014. Meijer noted that he was ‘not allowed’ to tell us this information due to confidentiality clauses. He drolly observed “It will be an interesting court case to be sued for telling the taxpayers how their money is being spent”. He also noted that because Elsevier is a public company their finances are available, and while their revenue goes up, their costs stay the same.

Apparently Wiley and Springer are willing to go into agreements. However Elsevier are saying that a global business model doesn’t match with a local business requirement. The Netherlands  has not yet signed the contract with Elsevier as they are working out the detail.

Broadly the deal is for three years, from 2016 to 2018. The plan is to grow the Open Access output from nothing to 10% in 2016, 20% in 2017, 30% in 2018 and want to do that without having to pay APCs. To achieve this they have to identify journals that we make Open Access , by defining domains where all journals in these domains we make open access.

Meijer concluded this was a big struggle – he would have liked to have seen more – but what we have is good for science. Dutch research will be open in fields where most Open Access is happening and researchers are paying APCs. Researchers can look at the long list of journals that are OA and then publish there.

*CORRECTION: Apologies for my mistyping.  Thanks to    @WvSchaik for pointing out this error on Twitter. The slide is captured in this tweet.

The future of the research library

Nancy Fried Foster from Ithaka S+R and Kornelia Tancheva from Cornell University Library spoke about research practices and the disruption of the research library. They started by noting that researchers work differently now, using different tools. The objective of their ‘A day in the life of a serious researcher’ work was exploring research practices to inform the vision of library of the future and identify improvements we could make now.

They developed a very fine-grained method of seeing what people do which focuses on what people really do in the workplace. This used a participatory design approach. Participants (who were mainly post graduates) were asked to map or log their movements in one single day where at least some of their time was engaged in research. The team then sat with the person the following day to ask them to narrate their day – and talk about seeking, finding and using information. There was no distinction between academic and non-academic activity.

The team looked at the things that people were doing and the things that the library could and will be. The analysis took a lot of time, organising into several big categories:

  • Seeking information
  • Academic activities
  • Library resources
  • Space, self management and
  • Circum-academic activities – activities allied to the researchers academic line but not central.

They also coded for ‘obstacles’ and ‘brainwork’.

The participants described their information seeking as fluid and constant – ‘you can just assume I am kind of checking my email all the time’. They also distinguished between search and research. One quote was ‘I know the library science is very systematic and organised and human behaviour is not like that’.

Information seeking is an iterative process, it is constant and not systematic. The search process is highly idiosyncratic – our subjects have developed ways of searching for information that worked for them. It doesn’t matter if it is efficient or not. They are self conscious that it is messy. ‘I feel like the librarians must be like “this is the worst thing I have ever heard”’.

Information evaluation is multi-tiered – eg: ‘If an article is talking about people I have heard of it is worth reading’. Researchers often use a mash up of systems that will work for that project. For example email is used as an information management tool.

Connectivity is important to researchers, it means you can work anywhere and switch rapidly between tasks. It has a big impact on collaboration – working with others was continuously mentioned in the context of writing. However sometimes researchers need to eliminate technology to focus.

Libraries have traditionally focused too much on search and not enough on brain work – this is a potential role for libraries. References to the library occurred throughout the process. Libraries are often thought of as a place for refuge – especially for the much needed brain work. The need for self management – enable them to manage their time prioritise the demands on their attention. Strategies depended on a complicated relationship with technology.

One of the major themes emerging from the work is search is idiosyncratic and not important, research has no closure, experts rule and research is collaboration. The implications for the future library are that the future library is a hub, not just focusing on a discovery system but connecting people with knowledge and technologies.

If we were building a library from scratch today what would it look like? There will need to be a huge amount of customisation to adjust tools to suit researchers personal preferences. The library of the future will have to be exponentially more customisable than the current offering. Libraries will have to make available their resources on customisable platforms. We need to shift from non-interoperable tools to customisation.

So if the future were here today we would think of future library – an academic hub (improving current library services) and an application store. We should take on even more of a social media aspect. Think of a virtual ‘app store’ – on an open source platform that provides the option for people to suggest short cuts – employ developers to develop these modules quickly. Take a leadership role in ensuring vendor platforms can be integrated. All library resources will speak easily to the systems our users are using. We need to provide individualised services rather than one size fits all.

Scientific Ecosystems and Research Reproducibility

The scientific reward structure determines the behaviour of researchers and that this has spawned the reproducibility crisis according to Marcus Munafo from the University of Bristol.

Marcus started by talking about the P value where the statistically significant value is 95% – that is, the chance of the hypothesis being wrong is less than five in 100. Generally, studies need to cross this threshold to get published, so there is evidence to show that original studies often suggest a large effect – however when attempted, these effects are not able to be replicated.

Scientists are supposed to be impartial observers, but in reality they need to get grants, and publish papers to get promoted to more ‘glamorous institutions’ (Marcus’ words). Scientists are rarely rewarded for being right, so the scientific record is being distorted by the scientific ecosystem.

Marcus noted it is common to overstate your data or error check your data if your first analysis doesn’t tell you what you are looking for. This ‘flexible analysis’ is quite commonplace, if we look at literature as a whole. Often there is not enough detail in the paper to allow the reproducibility of the work. There are nearly as many unique analysis pipelines as there were studies in the sample – so this flexibility in the joint analysis tool gets leveraged to get the result you want.

There is also evidence that journal impact factors are a very poor indicator of quality, indeed it is a stronger indicator of retraction than quality. The idea is that the whole science will self correct. But science won’t sort itself out in a reasonable timeframe. If you look at the literature you see that replication is the exception rather than the norm.

One study showed among 83 articles recommending effective interventions, 40 had not been replicated, and of those that had been replicated many showed the works had stronger findings in the first paper than in the replication, and some were contradicted in the replication.

Your personal investment in the field shapes your position – unconscious biases that affects all of us. If you come in as an early career scientist you get an impression that the field is more robust than it is in reality. There is hidden literature that is not citable – only by looking at this you have a balanced sense of how robust the literature is. There are many studies that make a claim in the abstract that is not supported by more impartial reading. Others are ‘optimistic’ in the abstract. The articles that describe bad news receive far fewer citations than would be expected. People don’t want to cite bad news. So is science self correcting?

We can introduce measures to help science self correct. In 2000 the requirement to register the outcome of clinical trials began. Once they had to pre-specify what the outcome would be then most of the findings were null. That is why it is a scientific ecosystem – the way we are incentivised has become distorted over the years.

Researchers are incentivised to produce a small number of papers that are eye catching.  It is understandable why you would want to focus on quality over quantity. We can give more weight to confirmatory studies and try to move away from the focus on publishing in certain types of studies. We shouldn’t be putting all our effort into high risk, high return.

What do we do about this? There can be top down measures, but individual groups can work in ways to improve the ways we work, such as adopting the open science way of working. This is not trivial – for example we can’t make data available without the consent of participants. Possible solutions include pre-registering all the plans, set up studies so the data can be made open, ensure publications are gold OA. These measures serve as a quality control method because everything gets checked because people know it is going to be made available. We come down hard on academics who make conscious mistakes – but we should be encouraging people to identify their own errors.

We need to introduce quality control methods implicitly into our daily practice. Open data is a very good step in that direction. There is evidence that researchers who know their data is going to be made open are more thorough in their checking of it. Maybe it is time for an update in the way we do science – we have statistical software that can run hundreds of analysis, and we can do text and data mining of lots of papers. We need to build in new processes and systems that refine science and think about new ways of rewarding science.

Marcus noted that these are not new problems, quoting from Reflections on the Decline of Science in England written by Babbage in 1830.

Marcus referred to many different studies and articles in his talk, some of which I have linked out to here:

Creating change to triumph: A view from Australia

The idea of creating change to triumph was the message of Jill Benn, the Librarian at the University of Western Australia. She discussed Cambietics, the science of managing change. This was a theory developed in 1985 by Barrett, with three stages:

  • Coping with change to survive
  • Capitalising on change
  • Creating change to triumph.

This last is the true challenge – to be an inventor rather than an imitator. Jill gave the Australian context. The country is 32 times bigger than UK, but has a third of the population, with 40 universities around the country. She noted that one of the reasons libraries in Australia have collaborated is the isolation.

Research from Australia counts for 4% of the world’s research output, it is the third largest export after energy, and out-performs tourism. The political landscape really affects higher education. There has been a series of five prime ministers in five years.

Australia has invested heavily in research infrastructure – mostly telescopes and boats. The Australian National Data Service was created and this has built the Research Data Australia interface – an amazing system full of data. The libraries have worked with researchers to populate the repository. There has been a large amount of capacity building. ANDS worked with libraries to build the capacities – the ’23 things’ training programme. You self register – on 1 March, 840 people had signed up for the programme.

The most recent element of the government’s agenda has been innovation. Prime Minister Turnbull has said he wanted to end the ‘publish or perish’ culture of research to increase the impact on community. There is a national innovation and science agenda and the government would not longer take into account publications for research. It is likely the next ERA (Australia’s equivalent of the REF) will involve impact in the community. The latest call is “innovation is the new black”.

There is financial pressure on the University sector – which pays in US dollars which is a problem. The emphasis on efficiency means the libraries have to show value and impact to the research sector.

Many well-developed services exist in university libraries to support research. Australian institutional repositories now have over 650K full text items, which are downloaded over 1 million times annually, there are data librarians and scholarly communication librarians. Some of the ways in which libraries have been asked to deliver capacity is CAUL and its Research Advisory Committee – to engage in the government’s agenda. There are three pillars – capacity building, engagement and advocacy, to promote the work of our libraries to bodies like Universities Australia.

Jill also mentioned the Australasian Open Access Strategy Group which has had a green rather than a gold approach. Australians are interested in open access. It is not yet clear what our role will be of institutional repositories into the future. In an environment where the government wants us to share our research.

How can we benchmark the Australian context? It is difficult. Look at our associations and about what data we might be able to share. Quote from Ross Wilkinson – yes there are individuals but the collective way Australia has managed data we are better able to engage internationally. Despite the investment into repositories in Australia – the UK outperforms Australia.

Australian libraries see themselves as genuine partners for research and we have a healthy self confidence (!). Libraries must demonstrate value and impact and provide leadership. Australian libraries have created change to triumph.

Open access mega-journals and the future of scholarly communication

This talk was given by Professor Stephen Pinfield from Sheffield University. He talked about the Open Access Mega Journal project he is working on with potentially disruptive open access journals (the Twitter handle is @oamj_project).

He began where it all began – with PLOS ONE, which is now the biggest journal in the world. Stephen noted that mega journals are full of controversy, listing comments ranging from them being the future of academic publishing, a disruptive innovation to the best possible future system.

However critics see them variously as a dumping ground, career suicide for early career researchers publishing in them and a cynical money making venture. However, Pinfield noted that despite considerable searching acknowledging what ‘people say’ is different from being able to provide attributed negative statements about mega-journals.

The open access and wide scope nature of mega-journals reverses the trend over past few years where journals have been further specialising, They are identifiable by their approach to quality control, with an emphasis on scientific soundness only rather than subjective assessments of novelty and also by their post publication metrics.

Pinfield noted that there are economies of scale for mega journals – this means that we have single set of processes and technologies. This enables a tiered scholarly publishing system. Mega-journals potentially allow highly selective journals to go open access (who often argue that they reject so much they couldn’t afford to go OA). Pinfield hypothesised that a business model could be where a layer of highly selective titles sits above a layer of moderately selective mega journals. The moderately selective journals provide the financial subsidy but the highly selective ones provide the reputational subsidy. PLOS is a good example of this symbiotic relationship.

The emphasis on ‘soundness’ in the quality control process reduces the subjectivity of judgements of novelty and importance and potentially shifts the role and the power of the gatekeepers. Traditionally the editors and editorial board members have been the arbiters of what is novel.

However this opens up some questions. If it is only a ‘soundness’ judgement then the question is whether power is shifted for good or ill? Also does the idea of ‘soundness’ translate to the Humanities? There is also the problem of an overreliance on metrics. Are the citation values of journals driven by the credibility or the visibility of the journals?

Pinfield emphasised the need for librarians to be informed and credible about their understanding of these topics. If librarians are to be considered important – we as a community need to be strong in our grasp of understanding these issues. There is an ongoing need to keep up to date and remain credible.

Working together to encourage researcher engagement and support

There were several talks about how institutions have been engaging researchers, and many of them emphasised the need to federate the workload across the institution. Chris Aware from the University of Hull discussed some work he was doing with Valerie McCutcheon on the current interaction between library and other parts of the institution in supporting OA, understand how OA is and could be embedded.

The survey revealed a desire for the management of Open Access to be more spread across the institution into the future. Libraries should be more involved in the management of the research information system and managing the REF. However Library involvement in getting Open Access into grant applications is lower – this is a research role, but it is worth asking how much this underpins subsequent activity.

As an aside Chris noted a way of demonstrating the value of something is to call it an ‘office’ – this is something the Americans do. (Indeed it is something Cambridge has done with the Office of Scholarly Communication).

Chris noted that if researchers don’t think about open access as part of the scholarly communications workflow then they won’t do it. Libraries play a key role in advocating and managing OA – so how can they work with other institutional stakeholders in supporting research?

Valerie later spoke about blurring and blending the borders between the Library and the Research Office. She noted that when she was working for Research and Enterprise (RSEO) she thought library people were nice, but she was not sure what the people do there. When she transferred to working in the Library, the perception back the other way was the same.

But the Research Office and the Library need to cooperate on shared strategic priorities. They are both looking out for changes in policy landscape they need to share information and collaborate on policy development and dissemination. They need better data quality in the research process to find solutions to create agile systems to support researchers.

At Glasgow the Library & RSEO were a good match because they had similar end uses and the same data. So this began a close collaboration between the two offices which worked together on the REF, used Enlighten. They also linked their systems (Enlighten and Research Systems) in 2010 where users can browse in the repository by the funder name. Glasgow has had a publications policy rather than an open access policy since 2008.

Valerie also noted that it was crucial to have high-level support and showed a video of Glasgow’s PVC-R singing the praises of the work the Library was doing.

The Glasgow Open Access model has been ‘Act on acceptance’ since 2013 – a simple message with minimal bureaucracy. A centralised service with ‘no fancy meetings’. Valerie also noted that when they put events on they don’t say it is a Library event, the sessions subject based not department based.

Torsten Reimer and Ruth Harrison discussed the support offered at Imperial College, where Torsten said he was originally employed for developing the College’s OA mandate but then the RCUK and the HEFCE policy came into place and changed everything. At Imperial, scholarly communications is seen as an overall concern for the College rather than specifically a Library issue.

Torsten noted the Library already had a good relationship with the departments. The Research Office is seen by researchers as a distraction from their research, but the Library is seen as helping research. However because the two areas have been able to approach everything with one single aim, this has allowed open access and scholarly support to happen across the institution and allowed the library to expand.

Imperial have one workflow and one system for open access which is all managed through Symplectic (there had been separate systems before). They have a simple workflow and form to fill in, then have a ticketing type customer workflow system plugged into Symplectic to pull information out at the back end. This system has replaced four workflows, lots of spreadsheets and much cut and pasting.

Sally Rumsey talked about how Oxford have successfully managed to engage their research community with their recently launched ‘Act on Acceptance’ communication programme.

Summary

This is a rundown of a few of the presentations that spoke to me. There were also excellent speed presentations, Lord David Willetts, the former Minister for Universities and Science spoke, we split up into workshops and there was a panel of library organisations around the world who discussed working together.

The personal outcomes from the conference include:

  • An invitation to give a talk at Cornell University
  • An invitation to collaborate with some people at CILIP about ensuring scholarly communication is included in some of the training offered
  • Discussion about forming some kind of learned society for Scholarly Communication
  • Discussion about setting up a couple of webinars – ‘how to start up an office of scholarly communication’ and ‘successful library training programmes’
  • Also lots of ideas about what to do next – the issue of language and the challenges we are facing in scholarly communication because of language deserves some investigation.

I look forward to next year.

Published 14 March 2016
Written by Dr Danny Kingsley
Creative Commons License

 

‘It is all a bit of a mess’ – observations from Researcher to Reader conference

“It is all a bit of a mess. It used to be simple. Now it is complicated.” This was the conclusion of Mark Carden, the coordinator of the Researcher to Reader conference after two days of discussion, debate and workshops about scholarly publication..

The conference bills itself as: ‘The premier forum for discussion of the international scholarly content supply chain – bringing knowledge from the Researcher to the Reader.’ It was unusual because it mixed ‘tribes’ who usually go to separate conferences. Publishers made up 47% of the group, Libraries were next with 17%, Technology 14%, Distributors were 9% and there were a small number of academics and others.

In addition to talks and panel discussions there were workshop groups that used the format of smaller groups that met three times and were asked to come up with proposals. In order to keep this blog to a manageable length it does not include the discussions from the workshops.

The talks were filmed and will be available. There was also a very active Twitter discussion at #R2RConf.  This blog is my attempt to summarise the points that emerged from the conference.

Suggestions, ideas and salient points that came up

  • Journals are dead – the publishing future is the platform
  • Journals are not dead – but we don’t need issues any more as they are entirely redundant in an online environment
  • Publishing in a journal benefits the author not the reader
  • Dissemination is no longer the value added offered by publishers. Anyone can have a blog. The value-add is branding
  • The drivers for choosing research areas are what has been recently published, not what is needed by society
  • All research is generated from what was published the year before – and we can prove it
  • Why don’t we disaggregate the APC model and charge for sections of the service separately?
  • You need to provide good service to the free users if you want to build a premium product
  • The most valuable commodity as an editor is your reviewer time
  • Peer review is inconsistent and systematically biased.
  • The greater the novelty of the work the greater likelihood it is to have a negative review
  • Poor academic writing is rewarded

Life After the Death of Science Journals – How the article is the future of scholarly communication

Vitek Tracz, the Chairman of the Science Navigation Group which produces the F1000Research series of publishing platforms was the keynote speaker. He argued that we are coming to the end of journals. One of the issues with journals is that the essence of journals is selection. The referee system is secret – the editors won’t usually tell the author who the referee is because the referee is working for the editor not the author. The main task of peer review is to accept or reject the work – there may be some idea to improve the paper. But that decision is not taken by the referees, but by the editor who has the Impact Factor to consider.

This system allows for information to be published that should not be published – eventually all publications will find somewhere to publish. Even in high level journals many papers cannot be replicated. A survey by PubMed found there was no correlation between impact factor and likelihood of an abstract being looked at on PubMed.

Readers can now get papers they want by themselves and create their own collections that interest them. But authors need journals because IF is so deeply embedded. Placement in a prestigious journal doesn’t increase readership, but it does increase likelihood of getting tenure. So authors need journals, readers don’t.

Vitek noted F1000Research “are not publishers – because we do not own any titles and don’t want to”. Instead they offer tools and services. It is not publishing in the traditional sense because there is no decision to publish or not publish something – that process is completely driven by authors. He predicted this will be the future of science publishing will shift from journals to services (there will be more tools & publishing directly on funder platforms).

In response to a question about impact factor and author motivation change, Vitek said “the only way of stopping impact factors as a thing is to bring the end of journals”. This aligns with the conclusions in a paper I co-authored some years ago. ‘The publishing imperative: the pervasive influence of publication metrics’

Author Behaviours

Vicky Williams, the CEO of research communications company Research Media discussed “Maximising the visibility and impact of research” and talked abut the need to translate complex ideas in research into understandable language.

She noted that the public does want to engage with research. A large percentage of public want to know about research while it is happening. However they see communication about research is poor. There is low trust in science journalism.

Vicki noted the different funding drivers – now funding is very heavily distributed. Research institutions have to look at alternative funding options. Now we have students as consumers – they are mobile and create demand. Traditional content formats are being challenged.

As a result institutions are needing to compete for talent. They need to build relationships with industry – and promotion is a way of achieving that. Most universities have a strong emphasis on outreach and engagement.

This means we need a different language, different tone and a different medium. However academic outputs are written for other academics. Most research is impenetrable for other audiences. This has long been a bugbear of mine (see ‘Express yourself scientists, speaking plainly isn’t beneath you’).

Vicki outlined some steps to showcase research – having a communications plan, network with colleagues, create a lay summary, use visual aids, engage. She argued that this acts as a research CV.

Rick Anderson, the Associate Dean of the University of Utah talked about the Deeply Weird Ecosystem of publishing. Rick noted that publication is deeply weird, with many different players – authors (send papers out), publishers (send out publications), readers (demand subscriptions), libraries (subscribe or cancel). All players send signals out into the school communications ecosystem, when we send signals out we get partial and distorted signals back.

An example is that publishers set prices without knowing the value of the content. The content they control is unique – there are no substitutable products.

He also noted there is a growing provenance of funding with strings. Now funders are imposing conditions on how you want to publish it not just the narrative of the research but the underlying data. In addition the institution you work for might have rules about how to publish in particular ways.

Rick urged authors answer the question ‘what is my main reason for publishing’ – not for writing. In reality it is primarily to have high impact publishing. By choosing to publish in a particular journal an author is casting a vote for their future. ‘Who has power over my future – do they care about where I publish? I should take notice of that’. He said that ‘If publish with Elsevier I turn control over to them, publishing in PLOS turns control over to the world’.

Rick mentioned some journal selection tools. JANE is a system (oriented to biological sciences) where authors can plug in abstract to a search box and it analyses the language and comes up with suggested list of journals. The Committee on Publication Ethics (COPE) member list provides a ‘white list’ of publishers. Journal Guide helps researchers select an appropriate journal for publication.

A tweet noted that “Librarians and researchers are overwhelmed by the range of tools available – we need a curator to help pick out the best”.

Peer review

Alice Ellingham who is Director of Editorial Office Ltd which runs online journal editorial services for publishers and societies discussed ‘Why peer review can never be free (even if your paper is perfect)’. Alice discussed the different processes associated with securing and chasing peer review.

She said the unseen cost of peer review is communication, when they are providing assistance to all participants. She estimated that per submission it takes about 45-50 minutes per paper to manage the peer review. 

Editorial Office tasks include looking for scope of a paper, the submission policy, checking ethics, checking declarations like competing interests and funding requests. Then they organise the review, assist the editors to make a decision, do the copy editing and technical editing.

Alice used an animal analogy – the cheetah representing the speed of peer review that authors would like to see, but a tortoise represented what they experience. This was very interesting given the Nature news piece that was published on 10 February “Does it take too long to publish research?

Will Frass is a Research Executive at Taylor & Francis and discussed the findings of a T&F study “Peer review in 2015 – A global view”. This is a substantial report and I won’t be able to do his talk justice here, there is some information about the report here, and a news report about it here.

One of the comments that struck me was that researchers in the sciences are generally more comfortable with single blind review than in the humanities. Will noted that because there are small niches in STM, double blind often becomes single blind anyway as they all know each other.

A question from the floor was that reviewers spend eight hours on a paper and their time is more important than publishers’. The question was asking what publishers can do to support peer review? While this was not really answered on the floor* it did cause a bit of a flurry on Twitter with a discussion about whether the time spent is indeed five hours or eight hours – quoting different studies.

*As a general observation, given that half of the participants at the conference were publishers, they were very underrepresented in the comment and discussion. This included the numerous times when a query or challenge was put out to the publishers in the room. As someone who works collaboratively and openly, this was somewhat frustrating.

The Sociology of Research

Professor James Evans, who is a sociologist looking at the science of science at the University of Chicago spoke about How research scientists actually behave as individuals and in groups.

His work focuses on the idea of using data from the publication process that tell rich stories into the process of science. James spoke about some recent research results relating to the reading and writing of science including peer reviews and the publication of science, research and rewarding science.

James compared the effect of writing styles to see what is effective in terms of reward (citations). He pitted ‘clarity’ – using few words and sentences, the present tense, and maintaining the message on point against ‘promotion’ – where the author claims novelty, uses superlatives and active words.

The research found writing with clarity is associated with fewer citations and writing in promotional style is associated with greater citations. So redundancy and length of clauses and mixed metaphors end up enhancing a paper’s search ability. This harks back to the conversation about poor academic writing the day before – bad writing is rewarded.

Scientists write to influence reviewers and editors in the process. Scientists strategically understand the class of people who will review their work and know they will be flattered when they see their own research. They use strategic citation practices.

James noted that even though peer review is the gold standard for evaluating the scientific record. In terms of determining the importance or significance of scientific works his research shows peer review is inconsistent and systematically biased. The greater the reviewer distance results in more positive reviews. This is possibly because if a person is reviewing work close to their speciality, they can see all the criticism. The greater the novelty of the work the greater likelihood it is to have a negative review. It is possible to ‘game’ this by driving the peer review panels. James expressed his dislike of the institution of suggesting reviewers. These provide more positive, influential and worse reviews (according to the editors).

Scientists understand the novelty bias so they downplay the new elements to the old elements. James discussed Thomas Kuhn’s concept of the ‘essential tension’ between the classes of ‘career considerations’ – which result in job security, publication, tenure (following the crowd) and ‘fame’ – which results in Nature papers, and hopefully a Nobel Prize.

This is a challenge because the optimal question for science becomes a problem for the optimal question for a scientific career. We are sacrificing pursuing a diffuse range of research areas for hubs of research areas because of the career issue.

The centre of the research cycle is publication rather than the ‘problems in the world’ that need addressing. Publications bear the seeds of discovery and represent how science as a system thinks. Data from the publication process can be used to tune, critique and reimagine that process.

James demonstrated his research that clearly shows that research today is driven by last year’s publications. Literally. The work takes a given paper and extracts the authors, the diseases, the chemicals etc and then uses a ‘random walk’ program. The result ends up predicting 95% of the combinations of authors and diseases and chemicals in the following year.

However scientists think they are getting their ideas, the actual origin is traceable in the literature. This means that research directions are not driven by global or local health needs for example.

Panel: Show me the Money

I sat on this panel discussion about ‘The financial implications of open access for researchers, intermediaries and readers’ which made it challenging to take notes (!) but two things that struck me in the discussions were:

Rick Andersen suggested that when people talk about ‘percentages’ in terms of research budgets they don’t want you to think about the absolute number, noting that 1% of Wellcome Trust research budget is $7 million and 1% of the NIH research budget is $350 million.

Toby Green, the Head of Publishing for the OECD put out a challenge to the publishers in the audience. He noted that airlines have split up the cost of travel into different components (you pay for food or luggage etc, or can choose not to), and suggested that publishers split APCs to pay for different aspects of the service they offer and allow people to choose different elements. The OECD has moved to a Freemium model where that the payment comes from a small number of premium users – that funds the free side.

As – rather depressingly – is common in these kinds of discussions, the general feeling was that open access is all about compliance and is too expensive. While I am on the record as saying that the way the UK is approaching open access is not financially sustainable, I do tire of the ‘open access is code for compliance’ conversation. This is one of the unexpected consequences of the current UK open access policy landscape. I was forced to yet again remind the group that open access is not about compliance, it is about providing public access to publicly funded research so people who are not in well resourced institutions can also see this research.

Research in Institutions

Graham Stone, the Information Resources Manager, University of Huddersfield talked about work he has done on the life cycle of open access for publishers, researchers and libraries. His slides are available.

Graham discussed how to get open access to work to our advantage, saying we need to get it embedded. OAWAL is trying to get librarians who have had nothing to do with OA into OA.

Graham talked the group through the UK Open Access Life Cycle which maps the research lifecycle for librarians and repository managers, research managers, fo authors (who think magic happens) and publishers.

My talk was titled ‘Getting an Octopus into a String Bag’. This discussed the complexity of communicating with the research community across a higher education institution. The slides are available.

The talk discussed the complex policy landscape, the tribal nature of the academic community, the complexity of the structure in Cambridge and then looked at some of the ways we are trying to reach out to our community.

While there was nothing really new from my perspective – it is well known in research management circles that communicating with the research community – as an independent and autonomous group – is challenging. This is of course further complicated by the structure of Cambridge. But in preliminary discussions about the conference, Mark Carden, the conference organiser, assured me that this would be news to the large number of publishers and others who are not in a higher education institution in the audience.

Summary: What does everybody want?

Mark Carden summarised the conference by talking about the different things different stakeholder in the publishing game want.

Researchers/Authors – mostly they want to be left alone to get on with their research. They want to get promoted and get tenure. They don’t want to follow rules.

Readers – want content to be free or cheap (or really expensive as long as something else is paying). Authors (who are readers) do care about the journals being cancelled if it is one they are published in. They want a nice clear easy interface because they are accessing research on different publisher’s webpages. They don’t think about ‘you get what you pay for.’

Institutions – don’t want to be in trouble with the regulators, want to look good in league tables, don’t want to get into arguments with faculty, don’t want to spend any money on this stuff.

Libraries – Hark back to the good old days. They wanted manageable journal subscriptions, wanted free stuff, expensive subscriptions that justified ERM. Now libraries are reaching out for new roles and asking should we be publishers, or taking over the Office of Research, or a repository or managing APCs?

Politicians – want free public access to publicly funded research. They love free stuff to give away (especially other people’s free stuff).

Funders – want to be confusing, want to be bossy or directive. They want to mandate the output medium and mandate copyright rules. They want possibly to become publishers. Mark noted there are some state controlled issues here.

Publishers – “want to give huge piles of cash to their shareholders and want to be evil” (a joke). Want to keep their business model – there is a conservatism in there. They like to be able to pay their staff. Publishers would like to realise their brand value, attract paying subscribers, and go on doing most of the things they do. They want to avoid Freemium. Publishers could be a platform or a mega journal. They should focus on articles and forget about issues and embrace continuous publishing. They need to manage versioning.

Reviewers – apparently want to do less copy editing, but this is a lot of what they do. Reviewers are conflicted. They want openness and anonymity, slick processes and flexibility, fast turnaround and lax timetables. Mark noted that while reviewers want credit or points or money or something, you would need to pay peer reviewers a lot for it to be worthwhile.

Conference organisers – want the debate to continue. They need publishers and suppliers to stay in business.

Published 18 February 2016
Written by Dr Danny Kingsley
Creative Commons License

Libraries of the future – insights from a talk by Lorcan Dempsey

There is no argument even from traditionalists that the library role is changing. But there is a great deal of confusion and sometimes fear about what that means, and what the future might look like.

On 3 June, Lorcan Demsey*  came to speak to staff at Cambridge University Library about how the role and purpose of libraries are changing. The slides from his talk are available on Slideshare.

The one sentence headline from the talk was that research libraries are moving from licensing published content to managing workflow and research outputs – which means the print collection needs to be managed down to free up resources for the new roles. The subhead is – if we don’t do this, the publishers are waiting in the wings to take over.

Modern libraries in research environments

The Library role is distilling from owned materials to facilitating access to many things. Changing focus to ‘discovery’ in the collection means there must be a loss of some of the items.

Lorcan noted that his sense is that there is still a low uptake of this concept. As someone who has been working in scholarly communications for over a decade. I agree.

The collection as a means to an end rather than an end it itself – in some ways this is obvious but in others it is a huge psychological shift. 

  • In a print world, researchers and learners organised their workflow around the library. The library had a limited interaction with the full process.
  • In a digital world the Library needs to organise itself around the workflows of research and learners. Workflows generate and consume information resources.

In libraries there is a separation of the discovery and the collection – library users are on the global level. The library will make some available and own some of those.

Change of focus

The research endeavour has moved from a focus on outcomes to begin to think about a range of activities around the process and the aftermath.

The traditional role of a library – outside of special collections and manuscripts – deals with outcomes like the books and journals. In this model students and researchers interact with the books and journals and then turn it into classically published works that come back into the library.

But we live now in an online world and the Library is interacting with the content in many different ways. There is interest in the process of research –methods, evidence and research data. There is also interest in the discussion around research through pre-prints, working papers and a variety of prepublication activity. This involves revision, derivative works and reuse. Copyright is important in these cases to let people know how things will be used.

This means that ‘collections’ from a library perspective now include the process, methods, discussions and outputs as well as books and journals.

From curation to creation

Mediated access to licensed material is becoming more streamlined, and other items becoming more available. Libraries are supporting creation, not just consumption.

Libraries need to be seen as a source for collaboration. There needs to be a partnership between the Library and the Faculty. The library is a partner in terms of the creation activity. The mediating role will continue.

Managing this transition is often done opportunistically – when people retire they are replaced with new skill sets.

The repository was seen by the Libraries initially as something in relation to artefacts but now it is seen as part of the workflow of the research lifecycle. There is less attention on what is coming in and more focus on sharing material back out.

Show me the money

There is a question around how much of this activity will be supported by the institution? And how is the resource shifting occurring in the libraries?

Lorcan said that while libraries talk about a growing interest in special collections and archives, there is no evidence from a budget perspective that this is being supported.

Publishers are trying to muscle in

Managing online identities

There is considerable interest amongst researchers in having a carefully tended online presence. This is time consuming, and would appear to be important to the researchers. This process is becoming intimately tied to publication – it is where people are announcing their publication.

Lorcan mentioned a study in Nature which was a survey of 3,000 scientists and engineers. They found 6% used Google Scholar, but more than half were using ResearchGate more regularly than LinkedIn. Not surprisingly this behaviour can be broken down by discipline. The social sciences tend to use Google Scholar, and academic.edu has low use by engineering and sciences. There are many solutions to the workflow to help researchers. Many of these will go away, but some are quite heavily used.

Thinking historically, a catalogue covers the material a library owns. The library has a discovery layer and a license. However this activity will have to shift to support creation. We have repositories, Google Scholar, ResearchGate etc. The incentive to use the repository is very low compared with these services.

A gap in the market?

Workflow is the new content – managing identity is where a lot of the focus is. Publishers are trying to position themselves as the service provider in this space.

Many libraries do not see their role as managing evolving scholarly records – the research and learning material. The curation of identity for researcher profiles is a big interest. This is often currently managed by research offices.

However this is a space into which publishers are moving. Several big publishers are now trying to be part of the full cycle for researchers.

For example Elsevier has two products – Pure is a content management system for research reporting and Mendeley is an academic social network. It is no coincidence that the word ‘solution’ is in the url thread. Similarly, Macmillian (publishers of Nature) recently bought Digital Science the company that created the equivalent products Symplectic and figshare. Digital Science was not included in the Macmillan Springer merger, possibly because they still need substantial investment. Lorcan noted that people see them as ‘plucky start-ups’ but they are owned by big publishers. There has been a big take-up of these services.

Lorcan showed a quote from Annette Thomas, CEO of Macmillian Publishers about ‘A publisher’s new job description’.

Her view is that publishers are here to make the scientific research process more effective by helping them keep up to date, find colleagues, plan experiments, and then share their results. After they have published, the process continues with gaining a reputation, obtaining funds, finding collaborators, and even finding a new job. What can we as publishers do to address some of the scientists’ pain points? 

As Lorcan observed – you can take out the word ‘publisher’ and replace with the word ‘library’.

Managing down collections

Libraries are increasingly wanting to organise their space around the student experience not around collections. Lorcan used a grid to illustrate the changing focus. The two distinctions were:

  • Whether items are in many collections or are rare or unique.
  • Whether items require stewardship. Items that are high stewardship items are looked after and resources are spent on them. Items that are low stewardship don’t get looked after in Libraries.

At one extreme is licensed materials which are high stewardship/many collections. The opposite corner is research materials which are only available in a few collections and are low stewardship.

Lorcan said he thinks in the future there will be a focus on distinctive collections. There needs to be a lot more money to do this. So licensed purchased material will be more streamlined. Management attention 15 years ago was on highly managed, licensed items, but now the focus has shifted to items of low stewardship.

Inside out library

Market materials: licensed/purchased stuff. Library as broker and telling users that these things are available in a special way.

Distinctive collections: Library is provider and want to maximise discoverability. Want other people to know about faculty expertise, and research data. Putting into own discovery layer doesn’t help there. Think about metadata and which aggregators are important. Been slow to realise that discoverability is vital.

These have very different dynamics. We want to share material held within the library with the rest of the world. The licensed stuff is external and libraries bring it in to share internally. This is inside out.

Traditionally libraries deal with published, purchased material (including special collections). However there is a shift away from acquisitions to demand. This means that libraries need to redirect their resources towards research support. One way of doing this is to manage down the print collection.

There was an explosion of publishing after the Second World War. In the same way that baby boomers are all retiring at the same time, we are now faced with the challenge of managing these collections down.

Challenges for identity

The managing down of print collections coincides with the push to repurpose space in libraries. There are many discussions with architects – managing down print means there must be refurbishment.

One of the issues emerging for libraries is: Without the books, does the campus see this as the Library? Is the space needed for the Library – could they be replaced by learning commons or the student union?

We can see the identity discussion about libraries emerging now. If we are managing down collections, what is the space for? What are the new services we offer? Lorcan mentioned media stories where librarians are being attacked by historians who see this as managerial, technocratic activity.

Lorcan described some of the shared collection activities happening in the USA.

Conclusion

We used to think of the Library as a collection. Now we need to think of the Library in terms of the user and their workflows.

We must move to more facilitated access to items, also move to the management and disclosure of curated materials. The print and digital scholarly record needs curation and co-ordination at a conscious national level.

The job is about restructuring the means but we need to make decisions about moving resources or bets on the future. Libraries must shift from an organisation where the end was known to one where we must take some risk.

Published 6 June 2015
Written by Dr Danny Kingsley
Creative Commons License

* Lorcan coordinates strategic planning and oversees Research, Membership and Community Relations at OCLC. He has worked for library and educational organizations in Ireland, the UK and the US. His influence on national policy and library directions is widely recognized. He is on Twitter – @LorcanID