Category Archives: Open Research at Cambridge Conference

How to get the most out of modern peer-review

On 30th March the Office of Scholarly Communication hosted an event How to get the most out of modern peer-review, bringing together researchers, publishers and library staff to discuss how peer review is changing. Dr Laurent Gatto was both a presenter and a participant, and with permission we have re-posted his blog about the event here.

Publisher presentations

There were presentations from eLife (Dr Wei Mun Chan) and F1000Research (Dr Sabina Alam, @Sab_Ra) in the Innovations in peer-review session. PeerJ was mentioned several times, for publishing their peer reviews, for example.

I general, I think the presenters did a good job in demonstrating modern peer review on how it can benefit authors and research in general: eLife with its consultative peer review, where editors and reviewers discuss their views and opinions before a decision is made, and F1000Research with their open post-publication peer review system. My personal experience with PeerJ (as a reviewer) and F1000Research (as a reviewer and author) have been excellent. All these journals are great venues for a modern open scholar.

Dr Jen Wright (@JennWrights) from Cambridge University Press presented a nice and detailed overview of how peer review works. I was well structured, following a FAQ model. She also very entertainingly illustrated her talk with references to PHDcomis, Lego Grad Student and Shit Academics Say.

.@JennWrights uses @legogradstudent to illustrate her peer review faq at  (View image in Twitter).

Open peer review

The highlight of the day was Corina’s (@LoganCorina) brilliant Open peer review – what is it and what does it achieve? talk. She made a strong point in favour of open peer review and reviewing ethics. Read her lab code of conduct about reviewing ethics, as well as publishing ethics, her commitment to conducting rigorous science, lab interpersonal interactions.

I was nice to hear how her efforts in ethical publishing and reviewing proved to have been very positive for her academic career, which contrasts to the fear that some early career researcher sometimes express that practising open science and ethical publishing could hinder their careers.

The role of peer-reviewers in promoting open science

I was also very happy to have the opportunity to give a talk about the role of peer review in promoting open science. My slides are available here. I plan to write it up and expand on it in a blog post.

In brief, my main message was that, it we want to promote rigorous science, we have an obligation to make sure that the data, software and methods are adequately shared and described, and that it was not too difficult or time consuming to check this as a peer reviewer.

Currently, as far as data is concerned, my ideal review system would be a 2-stage process, where

  1. Submit your data and meta-data to a repository, where it get’s checked (by specialists, data scientists, data curators) for quality, annotation, meta-data.
  2. Submit your research with a link to the peer reviewed data.

My talk earned me a lot of feed back and encouragements, both off and online.

View image on Twitter

The effect on my twitter activity today – the 12 – 2pm bar is 1689 impressions 🙂 (View image on Twitter)

Publons

I had heard about Publons before, but never took the time to learn more about it. Tom Culley made a great job at presenting it as a means to Getting formal recognition for your peer review work. I will definitely give it a go in the near future.

Show me the data

I went to Dr Varsha Khodiyar’s (@varsha_khodiyar) workshop Show me the data : tips and tricks with peer-reviewing research data. Varsha is the data curation editor at Scientific Data. I am not necessarily a big fan of data journals (see here for some background), but it is clear that she is doing great work making sure that the data that she checks and curated (in addition to the peer review) is available under an open license and of good quality.

When it comes to data/software submissions, I believe that often, many shortcomings are more a result of lack of adequate skills or experience in the process of good practice in sharing and documenting, rather that poor quality of the output. The review process should ideally serve as a way to support and education researchers, and the Bioconductor and rOpenSci projects are great examples of how the package review process is a way to genuinely help the authors to improve on their output, rather than a binary accept/reject outcome.

A closed 2-stage peer review, like is typically in place in journals is a horrible system for than. An open review, with more interactions between reviewers and authors would be a more efficient approach.

More about the event

To hear more about the event, have a look at the #oscpeereview hashtag on twitter. The event was live streamed and will be made available on YouTube in the coming day – I will add a link later.

All in all, I think it was a great event. Kudos to the Office of Scholarly Communication for their efforts and continuous dedication. As emphasised by many participants, events like this constitute a unique and important channel highlighting important innovations in digital and open science that are redesigning scholarship. They are also a unique venue where open researcher can express and discuss challenges and opportunities with the wider academic community.

Published 4 April 2017
Written by Dr Laurent Gatto
Creative Commons License

“Become part of the research process” – observations from RLUK2017

When is a librarian not a librarian? Rather than a bad joke, this was one of the underlying interesting discussions arising from the 2017 RLUK conference held earlier in March. The conference Twitter hashtag was #rluk17 and the videos are now available. The answer, it appears is when we start talking about partnerships with, rather than support of, our research community.

As always with my write-ups of conferences, these are simply the parts that have resonated with me, and the impression I walked away with. This write up will be very different from anyone else’s from the conference, such as this blog from Lesley Pitman, and the RLUK conference report.

I have also written a sister blog describing the workshop I co-presented on the topic of Text and Data Mining.

Libraries’ role in research

The role of libraries and the people who work in them was the theme of one session – with arguments that libraries should be central to the research process.

Masud Khokhar, the Head of Digital Innovation and Research Services at Lancaster University, gave a talk on the Role of research libraries in a technological future. He said we need to get out of the culture of researchers only coming to the library with research outputs/outcomes. Language matters, he said. Lancaster University has made a deliberate decision not to use the word ‘support’, because “we have bigger aims than that”. Partnership is the future for libraries rather than just collaboration. We need to be creative co-developers working with the research community if we are to be a research library.

We need to generate a culture of experimentation: “Be creative, experiment fast, succeed or fail fast and learn from both”. It is a good challenge for us librarians to be more creative and less passive. We should embed library in research questions and processes.

The issue of how we present information to our clients came up, with Khokhar saying consistency when searching should no longer be important – we should depend on the context of the searcher. “Content might be king, but context is the kingdom”, he said.

He also showed evidence of how data visualisation can lead to greater downloads of data, and it may be even more important to data use than good metadata. Indeed, Lancaster University Library has allowed 10TB of server space for analytics of library data alone, because this is a growing and important area to drive decision making.

This perspective was also put forward by Patrick McCann from the University of St Andrews Library. He talked about the new role of Research Software Engineers, which is a role which works with the research community to develop research solutions and research outputs. St Andrews has a senior librarian for digital humanities and research computing. He noted: “we are part of the research process”.

A comment was made during the conference that many speakers had identified themselves as ‘not a librarian’. There was a call for us to open the idea of what a librarian is. Masud Khokhar suggested he would consider himself to be an ‘honorary’ librarian.

But the ‘librarian or not’ debate is an interesting question. William Nixon from the University of Glasgow noted that their Research Data Management team are not librarians, saying “it is a skill set in itself. Kokhar argued that we need to develop digital leaders for libraries. Are these people already in libraries who we train up, or are they people with these skill sets we bring in and introduce to library culture?

Libraries’ role in the Open Science agenda

Libraries are the central pivot point for the move to open research across the world, was the message from presentations about activities in Europe and Canada. This fits with the narrative that libraries should be driving the agenda rather than reacting to it.

Susan Reilly, the outgoing Executive Director of LIBER talked about re-imagining the library space in the context of open science as she presented the LIBER 2020 vision.

Open Science (a term used in Europe for ‘open research’) is on the European agenda, every single member state has signed up to develop the necessary skills, development of the open science cloud. There has been an 80 million Euro investment in this. Given LIBER is a group of libraries with a common mission to enable world-class research, the question is whether LIBER should make the whole strategy about open science?

Reilly noted that libraries have been ‘bold’ on open science for years and held back by faculty and publishers. She argued we must be resilient on this agenda. Libraries need to be taking a leadership role in all research. “Libraries need to get into the researchers’ lifecycle”, she argued. They should provide tools throughout the research lifecycle to ensure ‘open science’. To achieve this, we need digital skills, which underpin a more open and transparent research lifecycle.

The end goal, said Reilly, is world-class research, but open science facilitates that through facilitating collaboration and ensuring the sustainability of research. The 2020 vision is: “Libraries powering sustainable knowledge in the digital age”.

The proposal is that by 2022, open access will be the predominant form of publishing and research data is Findable Accessible Interoperable Reusable (F.A.I.R). Reilly noted that it is research data management “where we get the most pushback” – an experience reflected in many other institutions.

Libraries can provide platforms of innovative scholarly communications. They can facilitate open access to research publications, with services ranging from payment for APCs and becoming a publisher. Libraries also offer research data management, innovative metrics and innovative peer review.

This is an opportunity for libraries to disrupt scholarly communications system. In order for us to achieve this goal, we need research skills that underpin a more open and transparent research lifecycle – and so we need to equip researchers to do this.

Reilly noted that when LIBER went out to stakeholders – “they bought into the vision”. To achieve these goals, Reilly said it is important for libraries to have a strong relationship with institutional leadership. There needs to be transparency around the cost of publications.

We need to work on diversifying librarian’s skills and research skills. This is a matter of ‘compete or fail’ or Elsevier could take over what libraries do. We need to get into the research workflow.

LIBER’s outcomes from their consultation with stakeholders were:

  • Importance of libraries having a string relationship with institutional leadership
  • Transparency around the cost of publications
  • Working on diversifying librarians’ skills AND researchers skills
  • Be clear about what the role of libraries is/should be
  • Compete or fail
  • Get into the research workflow
  • Opportunity for libraries to disrupt scholarly communications system

It was interesting (for me) to note how similar these are to the Strategic Goals of the Office of Scholarly Communication:

The Open Scholarship theme was continued in a presentation by representatives of RLUK’s sister organisation, the Canadian Association of Research Libraries (CARL). This is a leadership organisation thinking of ways to enhance members capacity and leadership in this environment. Martha Whitehead, the President of CARL and Susan Haigh, the Executive Director presented the Canadian Roadmap for advancing Scholarly Communication.

There are issues with open access, they noted. Repositories need to improve in two major areas – we need to improve their functionality, and support and encourage the development of value added services such as peer review and tools.

There have been challenges in discussions with publishers about maximising openness which have become ‘somewhat fraught’. Libraries are working with Canadian journals to develop, assess and adopt sustainable open access funding models. The idea is that the model will be non-profit (where the money goes back in).  While it is not clear if the discussions will coalesce around anything new and bold, there is value in bringing together the communities.

The Canadians presented an initiative related to Research Data Management (RDM) called Portage. This is designed to help with RDM in the country. It has a director, and because it is an organisation with a facility, the library voice is well respected around the table. Experts are contributing their expertise to this. There is also a Federated Research Data Repository – a joint software development project with Compute Canada, and the Scholars Portal Dataverse offers data deposit and sharing at no charge to researchers.

New challenges for libraries

Torsten Reimer spoke about the new focus of the British Library on ‘everything accessible’. He discussed the implications for libraries as we move towards a more open access future. We need to change focus, he argued, with new skills and areas, and we should be working together with the research community.

As more material is available openly then what is the role of a national library? Reimer asked. Perhaps libraries need to provide infrastructure, we should focus on preservation & adding value. Given the majority of academics use software in the context of their projects, should libraries be supporting, integrating and preserving it?

The ‘just in case’ model is no longer feasible for libraries. The British Library is looking at partnerships in content creation, research & infrastructure. Examples include plans to expose the EThOS API to allow for machine consumption of data about theses. They are also looking to replace the current “hand knitted” preservation system with more robust scalable shareable solution

Collaborate or die?

The opening keynote was by John MacColl, the University Librarian & Director of Library Services, at St Andrews University (and outgoing president of RLUK). MacColl spoke about the ‘research commons’.

He referred to the ‘tragedy of the commons’ which was an argument put forward in 2003 that individuals cancelling subscriptions for the Big Deal had meant an increase of 129% in cost to access literature. Publishers are creating ‘artificial scarcity’ to the literature which means they can charge as they please. This is a ransack of the commons.

It is not just cost, these Big Deals have meant that most collections are becoming the same and we are losing access to other resources. MacColl also noted the lost need for bibliographers. But his call was that research libraries face a challenge in re-appropriating the responsibility for the preservation of key scholarly objects held on publisher servers and other vendors worldwide.

So, argued McColl, we need to work collectively to ‘find means of getting around being held ransom by publishers’. We need a ‘post-collective Big Deal world’. This is Plan B, where we take back control, find post cancellation access, arrange document delivery and green open access.

But this is not something we can do individually. MacColl asked: “When we are doing things in our own institutions, who are we letting down by not thinking of the wider community?” We need some sort of formal governance to make that happen. The challenge is Higher Education is a very conservative world. People will not take a step unless convinced this is a sensible step to take.

We need to focus on the global – where libraries collaborate on shared bibliographic data and create a ‘collective collection’. Plan B needs to be national.

So much more

This blog has glossed over many very interesting presentations and talks. I do, however wish to mention the last session of the event which broadened the discussion outside of the library to the issue of ‘inclusion’ in the Higher Education sector. Libraries, as a neutral ‘safe’ place on campus, of course have a big role to play in this. As has been the case in every meeting I have attended since November last year, the double threats of Brexit and Trump have never been far from the discussion, and never more so than in the context of inclusion.

Darren Lund, a ‘middle aged white guy from Canada’ spoke very entertainingly about his work on diversity, making the point that if you have privilege you should use it to make positive change.

The final talk was a sobering walk through some research into the racial diversity of universities with plenty of data proving that universities are not as liberal as they are perceived to be by us. Statistics such as 92% of professors in the UK are white, and the fact there are only three Vice Chancellors from the black and minority ethnic community in the UK, supported Professor Kalwant Bhopal’s argument that we need to actively address the issue of inclusion.

Summary

This blog began with a fairly provocative statement – that people do not identify themselves as librarians when we start talking about partnerships with, rather than support of, our research community. This is an interesting question. Many librarians feel that their role is to support, not lead. Yet others argue that unless we do take a leading role we will become redundant.

So what is the solution? Do we widen the definition of a library? Do we widen the definition of a librarian? Or are we happy with the ‘honorary librarian’ solution? These are some of the questions that need further teasing out. One thing is sure, the landscape is changing rapidly and we need to change with it.

Debbie Hansen and Danny Kingsley attended the conference thanks to the support of the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

Published 30 March 2017
Written by Dr Danny Kingsley
Creative Commons License

Service Level Agreements for TDM

Librarians expect publishers to support our researchers’ rights to Text and Data Mining and not cut access off for a library if they see ‘suspicious’ activity before they establish whether it is legitimate or not. These were the conclusions of a group who met at a workshop to discuss provision of Text and Data Mining services in March. The final conclusions were:

Expectations libraries have of publishers over TDM

The workshop concluded with very different expectations to what was originally proposed. The messages to publishers that were agreed were:

  1. Don’t cut us off over TDM activity! Have a conversation with us first if you notice abnormal behaviour*
  2. If you do cut us off and it turns out to be legitimate then we expect compensation for the time we were cut off
  3. Mechanisms for TDM where certain behaviours are expected need to be built into separate licensing agreements for TDM

*And if you want to cut us off – please demonstrate there are all these illegal TDM activities happening in the UK

Workshop on TDM

The workshop “Developing a research library position statement on Text and Data Mining in the UK” was part of the recent RLUK2017 conference.  My colleagues, Dr Debbie Hansen from the Office of Scholarly Communication and Anna Vernon from Jisc, and I wanted to open up the discussion about Text and Data Mining (TDM) with our library community. We have made the slides available and they contain a summary of all the discussions held during the event. This short blog post is an analysis of that discussion.

We started the workshop with a quick analysis of who was in the room using a live survey tool called Mentimeter. Eleven participants came from research institutions – six large, four small and one  from an ‘other research institution’. There were two publishers, and four people who identified as ‘other’ – which were intermediaries. Of the 19 attendees, 14 worked in a library. There was only one person who said they had extensive experience in TDM, four people said they were TDM practitioners but the largest group were the 14 who classified themselves as having ‘heard of TDM but have had no practical experience’.

The workshop then covered what TDM is, what the legal situation is and what publishers are currently saying about TDM . We then opened up the discussion.

Experiences of TDM for participants

In the initial discussion about experiences of the participants, a few issues were raised if libraries were to offer TDM services. Indeed there was a question whether this should form part of library service delivery at all. The issue is partly that this is new legislation, so currently publisher and institutions are reactive, not strategic in relation to TDM. We agreed:

  • There is a need for clearer understanding of the licensing situation with information
  • We also need to create a mechanism of where to go for advice, both within the institution and the publisher
  • We need to develop procedures of what to do with requests – which is a policy issue 
  • Researcher behaviour is a factor – academics are not concerned by copyright.

Offering TDM is a change of role of the library – traditionally libraries have existed to preserve access to items. The group agreed we would like to be enabling this activity rather than saying “no you can’t”. There are library implications for offering support for TDM, not least that librarians are not always aware of TDM taking place within their institution. This makes it difficult to be the central point for the activity. In addition, TDM could threaten access through being cut off, so this is causing internal disquiet.

TDM activity underway in Europe & UK

We then presented to the workshop some of the activities in TDM that are happening internationally, such as the FutureTDM project. There was also a short run down on the new copyright exception for research organisations carrying out research in public interest being proposed to the European Commission allowing researchers to carry out TDM of copyright protected content if they have lawful access (e.g. subscription) without prior authorisation.

ContentMine is a not for profit organisation that supplies open source TDM software to access and analyse documents. They are currently partnering with Wikimedia Foundation with a grant to develop WikiFactMine which is a project aiming to make scientific data available to editors of Wikidata and Wikipedia.

The ChemDataExtractor is a tool built by the Molecular Engineering Group at the University of Cambridge. It is an open source software package that extracts chemical information from scientific documentation (e.g. text, tables). The extracted data can be used for onward analysis. There is some information in a paper  in the Journal of Chemical Information and Modelling: ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature“.

The Manchester Institute of Biotechnology hosts the National Centre for Text Mining (NaCTeM), which works with research partners to provide text mining tools and services in the biomedical field.

The British Library had a call for applications for a PhD student placement to undertake thesis text mining on 150,000 theses held in EThOS to extract new metadata such as names of supervisors.  Applications closed 20 February 2017, but according to an EThOS newsletter from March,  they had received no applications for the placement. The suggestion is that “perhaps that few students have content mining skills sufficiently well developed to undertake such a challenging placement”.

The problem with supporting TDM in libraries

We proposed to the workshop group that libraries are worried about getting cut off from their subscription by publishers due to large downloads of papers through TDM activity. This is because publishers’ systems are pre-programmed to react to suspicious activity. If TDM invokes automated investigation, then this may cause an access block.

However universities need to maintain support mechanism to ensure continuity of access. For this to occur we require workflows for swift resolution, fast communication and a team of communicators. This also requires education of researchers of potential issues.

We asked the group to discuss this issue – noting reasons why their organisation is not actively supporting TDM and if they are the main challenges they face.

Discussion about supporting TDM in libraries

The reasons put forward for not supporting TDM included practical issues such as the challenges of handling physical media and the risk of lockout.

The point was made that there was a lack of demand for the service. This is possibly because the researchers are not coming to the Library for help. There may be a lack of awareness in the IT areas that the Library can help and they may not even pass on the queries.  This points to the need for internal discussion with institutions.

It was noted that there was an assumption in the discussion that the Library is at the centre of this type of activity, however and we are not joined up as organisations. The question is who is responsible for this activity? There is often no institutional view on TDM because the issues are not raised at academic level. Policy is required.

Even if researchers do come to the library, there are questions about how we can provide a service. Initially we would be responding to individual queries, but how do we scale it up?

The challenges raised included the need for libraries to ensure everyone understands the needs at the the content owner level. The library, as the coordinator of this work would need to ensure the TDM is not for commercial use, and need to ensure people know their responsibilities. This means the library is potentially being intrusive on the researcher process.

Service Level Agreement proposal

The proposal we put forward to the group was that we draft a statement for a Service Level Agreement for publishers to assure us that if the library is cut off, but the activity is legal, we will be reinstated within and agreed period of time. We asked the group to discuss the issues if we were to do this.

Expectation of publishers

The discussion has raised several issues libraries had experienced with publishers over TDM. One participants said the contract with a particular publisher to allow their researchers to do TDM took two years to finalise.

There was a recognition that for genuine TDM to be identified might require some sort of registry of TDM activity which might not be an administrative task all libraries want to take on. The alternative suggestion was a third party IP registry, which could avoid some of the manual work. Given that LOCKSS crawls publisher software without getting trapped, this could work in the same way with a bank of IP addresses that is secured for this purpose.

Some solutions that publishers could help with include publishers delivering material in different ways – not on a hard drive. The suggestion was that this could be part of a platform and the material was produced in a format that allowed TDM (at no extra cost).

Expectation of libraries

There was some distaste amongst the group for libraries to take on the responsibility for maintaining  a TDM activity register. However libraries could create a safe space for TDM like virtual private networks.

Licenses are the responsibility of libraries, so we are involved whether we wish to be or not. Large scale computational reading is completely different from current library provision. There are concerns that licensing via the library could be unsuitable for some institutions. This raises issues of delivery and legal responsibilities. One solution for TDM could be to record IP address ranges in licence agreements. We need to consider:

  • How do we manage the licenses we are currently signed up to?
  • How do we manage licensing into the future so we separate different uses? Should we have a separate TDM ‘bolt on’ agreement.

The Service Level Agreement (SLA) solution

The group noted that, particularly given the amount publisher licenses cost libraries, being cut off for a week or two weeks with no redress is unusual at best in a commercial environment. At minimum publishers should contact the library to give the library a grace period to investigate rather than being cut off automatically.

The basis for the conversation over the SLA includes the fact that the law is on the subscriber’s side if everyone is doing it legally. It would help to have an understanding of the extent of infringing activity going on with University networks (considering that people can ‘mask’ themselves). This would be useful for thinking of thresholds.

Next steps

We need to open up the conversation to a wider group of librarians. We are hoping that we might be able to work with RLUK and funding councils to come to an agreed set of requirements that we can have endorsed by the community and which we can then take to to publishers.

Debbie Hansen and Danny Kingsley attended the RLUK conference thanks to the support of the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

Published 30 March 2017
Written by Dr Danny Kingsley
Creative Commons License