Tag Archives: TDM

Text and data mining services: an update

Text and Data Mining (TDM) is the process of digitally querying large collections of machine-readable material, extracting specific information and, by analysis, discovering new information about a topic.

In February 2017, a group University of Cambridge staff met to discuss “Text and Data Mining Services: What can Cambridge libraries offer?”  It was agreed that a future library Text and Data Mining (TDM) support service could include:

  • Access to data from our own collections
  • Advice on legal issues, what publishers allow, what data sets and tools are available
  • Registers on data provided for mining and TDM projects
  • Fostering agreements with publishers.

This blog reports on some of the activities, events and initiatives, involving libraries at the University of Cambridge, that have taken place or are in progress since this meeting (also summarised in these slides).  Raising awareness, educating, and teasing out the issues around the low uptake of this research process have been the main drivers for these activities.

March 2017: RLUK 2017 Conference Workshop

The Office of Scholarly Communication (OSC) and Jisc ran a workshop at the Research Libraries UK 2017 conference to discuss Research Libraries and TDM.  Issues raised included licencing, copyright, data management, perceived lack of demand, where to go for advice within an institution or publisher, policy and procedural development for handling TDM-related requests (and scaling this up across an institution) and the risk of lock-out from publishers’ content, as well as the time it can take for a TDM contract to be finalised between an institution and publisher.  The group concluded that it is important to build mechanisms into TDM-specific licencing agreements between institutions and publishers where certain behaviours are expected.  For example, if suspicious activity is detected by a publisher’s website, it would be better not to automatically block the originating institution from accessing content, but investigate this first (although this may depend on systems in place), or if lock-out happens and the activity is legal, participants suggested that institutions should explore compensation for the time that access is lost if significant.

July 2017: University of Cambridge Text and Data Mining Libguide

Developed by the eResources Team, this LibGuide explains about Text and Data Mining (TDM): what it is, what the legal issues are, what you can do and what you should not try to do. It also provides a list of online journals under license for TDM at the University of Cambridge and a list of digital archives for text mining that can be supplied to the University researchers on a disc copy. Any questions our researchers may have about a TDM project, not answered through the LibGuide, can be submitted to the eResources Team via an enquiry form.

July 2017: TDM Symposium

The OSC hosted this symposium to provide as much information as possible to the attendees regarding TDM.  Internal and external speakers, experienced in the field, spoke about what TDM is and what the issues are; research projects in which TDM was used; TDM tools; how a particular publisher supports TDM; and how librarians can support TDM.

At the end of the day a whole-group discussion drew out issues around why more TDM is not happening in the UK and it was agreed that there was a need for more visibility on what TDM looks like (e.g. a need for some hands-on sessions) and increased stakeholder communication: i.e. between publishers, librarians and researchers.

November 2017: Stakeholder communication and the TDM Test Kitchen

This pilot project involves a publisher, librarians and researchers. It is providing practical insight into the issues arising for each of the stakeholders: e.g. researchers providing training on TDM methods and analysis tools, library support managing content accessibility and funding for this, and content licencing and agreements for the publisher. We’ll take a more in-depth look at this pilot in an upcoming blog on TDM – watch this space.

January 2018: Cambridge University Library Deputy Director visits Yale

The Yale University Library Digital Humanities Laboratory provides physical space, resources and a community within the Library for Yale researchers who are working with digital methods for humanities research and teaching. In January this year Dr Danny Kingsley visited the facility to discuss approaches to providing TDM services to help planning here. The Yale DH Lab staff help out with projects in a variety of ways, one example being to help researchers get to grips with digital tools and methods.  Researchers wanting to carry out TDM on particular collections can visit the lab to do their TDM: off-line discs containing published material for mining can be used in-situ. In 2018, the libraries at Cambridge have begun building up a collection of offline discs of specific collections for the same purpose.

June 2018: Text and Data Mining online course

The OSC collaborated with the EU OpenMinTeD project on this Foster online course: Introduction to Text and Data Mining.  The course helps a learner understand the key concepts around TDM, explores how Research Support staff can help with TDM and there are some practical activities that even allow those with non-technical skills try out some mining concepts for themselves.  By following these activities, you can find out a bit more about sentence segmentation, tokenization, stemming and other processing techniques.

October 2018: Gale Digital Scholar Lab

The University of Cambridge has trial access to this platform until the end of December: it provides TDM tools at a front end to digital archives from Gale Cengage.  You can find out more about this trial in this ejournals@cambridge blog.

In summary…

Following the initial meeting to discuss research support services for TDM, there have been efforts and achievements to raise awareness of TDM and the possibilities it can bring to the research process as well as to explore the issues around the low usage of TDM in the research community at large.  This is an on-going task, with the goal of increased researcher engagement with TDM.

Published 23 October 2018
Written by Dr Debbie Hansen
Creative Commons License

Developing the staff of the future: training librarians in 2017

2017 was an exciting year for training our library community. As well as continuing to cover the basics of research support, the OSC was able to introduce new topics and new methods of delivery to ensure that Cambridge library staff have all the information they need to support the research community. In this blog post our Research Support Skills Coordinator Claire Sewell reflects on the successes of the past year and her plans to make 2018 even better.

This time last year I was reflecting on my first full year in my role, having started in November 2015. After more than two years in the role some things have remained constant but there have also been a great many changes in training, so it seemed like a good idea to stop and reflect again.

The OSC runs two parallel professional development schemes for library staff: Supporting Researchers in the 21st Century and the Research Support Ambassador Programme. Supporting Researchers is open to all library staff and offers a regular programme of training in areas related to research support throughout the year. The Research Support Ambassadors programme is a more intensive programme which runs every summer and is designed to create a library workforce who feel confident in helping researchers with their queries.

Supporting Researchers in the 21st Century

The world of the academic library is changing and it’s important that institutions work to equip the staff with the knowledge they need to take advantage of these changes. The Supporting Researchers programme offers a range of training opportunities from general talks to in-depth workshops which are designed to help staff keep on top of the rapidly changing world of scholarly communication.

In 2017 we ran twenty-three training events catering to the needs of over four hundred staff. In addition to covering some of the expected areas such as Open Access and Research Data Management we looked at some new areas such as Text and Data Mining and predatory publishing. These sessions proved to be a hit with attendees, with 70% of those attending rating the sessions as ‘excellent’. They were also enthusiastic in their feedback:

Excellent session on predatory publishers. We’ve started to get a lot of questions in this area and knowing more about it came at the perfect time

It was really engaging and a perfect introduction to the topic. I only had a vague idea at the outset as to what predatory publishing is but by the end of it I felt really well-informed (and in a short space of time!)

In order to help staff plan their time and attendance we experimented with forming sessions into mini programmes which resulted in our Librarian Toolkit sessions on Helping Researchers Publish and Open Access. This seemed to be successful so it’s something we’ll be continuing in 2018. By far our most successful session was How to Spot a Predatory Publisher, which was delivered in direct response to demand from staff who were getting a lot of questions from their users on the topic. It was so successful that we’ve gone on to produce some local guidance and a webinar which has over 300 views to date.

Research Support Ambassador Programme

In 2017 the Research Ambassador Programme ran from August to October and attracted eighteen participants from across colleges, departments and the University Library. We tried something a little different this year by making most of the training available online. Librarians are notoriously busy people and coupling this with summer holidays and the introduction of a new library management system meant that it would have been impractical to schedule in a host of face-to-face sessions. The initial introductory workshop ran as an in-person session to allow Ambassadors to meet each other and put faces to names but all other sessions were delivered as interactive webinars.

Although formal feedback is still being collated, initial responses have been positive:

I feel much more confident now that I have a good overview of all the issues confronting researchers and I will be able to know how to train researchers and who to refer them to for more information

Thanks for the programme. The content was really interesting and delivering via webinar was helpful as I didn’t have to leave my desk. I feel much more confident in dealing with researcher questions now.

Now that we have three cohorts of past Research Ambassadors in Cambridge it’s time to expand the programme for those still wishing to be involved. It’s hoped that this will create a community of research support librarians and strengthen it into the future as new staff take part in the programme.

Webinars

Introducing a new training format is always a challenge but in the case of OSC webinars it’s one where the hard work has paid off. Many library staff have commented over the past two years that although they would like to attend training session they can’t due to issues with library staffing and other commitments. Repeating sessions and varied scheduling helps to some extent but we felt that more could be done. Having attended many webinars myself I knew they were a great way to attend training without having to leave my desk, especially if recordings could be accessed at a later date.

Over the course of 2017 the OSC delivered a total of nine webinars for library staff. Feedback on the format from library staff was positive:

Working in a small Library where most staff are part time makes it difficult to get out of the Library to attend training so being able to take part online was great.

I really enjoy the ability to listen back at a convenient time; I often cannot leave the library at short notice due to lack of cover, or unforeseeable research enquiries that overrun and unfortunately take precedence over courses etc.

Nice and flexible – can watch from anywhere!

As a result of this success, the webinar format is now being used for additional training for both the research community and an audience beyond Cambridge.

Moving beyond Cambridge

It’s also been a busy year for training library staff outside Cambridge. In May I went to talk to CPD25, the staff development programme of the M25 Consortium of Academic Libraries on Making the Modern Academic Librarian and gave a presentation on the Librarian as Researcher to CILIP in Kent in November. I was also lucky enough to visit Salzberg to talk about the skills librarians can bring to the support of Text and Data Mining. The OSC has also been involved in talking to other interested stakeholders about the wider need for research support training for library staff which has led to some exciting progress.

We’ve also been busy talking about Cambridge initiatives to the wider world. In April 2017 I went to LILAC – the major information literacy conference for librarians – in Swansea and gave three presentation including a poster on the Supporting Researchers in the 21st Century programme, a presentation on the Research Support Ambassador programme and a workshop on Engaging Students with Research Data Management. This has led to a wider interest in these programmes and the issue of research support training more widely.

Perhaps the biggest impact we’ve had has been the publication of an article on the Research Support Ambassador Programme in the New Review of Academic Librarianship. To date this has had over two thousand views and was the most read article published in the journal in 2017. I was very excited to discover this week that it has its first citation and that it has been chosen to receive a cartoon abstract as part of the launch of the publisher’s new librarian platform this year. Lots to look forward to!

Future plans

So, what next? Plans for the Research Support Ambassadors are moving forward and we have several interesting sessions lined up for our librarians already. There has also been a lot of interest in offering training to a wider audience starting with a session on Moving Into Research Support in February and more to come. Hopefully there will also be more publications in the future and of course updates on this blog. The OSC is very much looking forward to working with our library community throughout 2018 and beyond to bring them more exciting training opportunities.

Published 26 January 2018
Written by Claire Sewell
Creative Commons License

Next steps for Text & Data Mining

Sometimes the best way to find a solution is to just get the different stakeholders talking to each other – and this what happened at a recent Text and Data Mining symposium held in the Engineering Department at Cambridge.

The attendees were primarily postgraduate students and early career researchers, but senior researchers, administrative staff, librarians and publishers were also represented in the audience.

Background

This symposium grew out of a discussion held earlier this year at Cambridge to consider the issue of TDM and what a TDM library service might look like at Cambridge. The general outcome of that meeting of library staff was that people wanted to know more. Librarians at Cambridge have developed a Text and Data Mining libguide to assist.

So this year the OSC has been doing some work around TDM, including running a workshop at Research Libraries UK annual conference in March. This was a discussion about developing a research library position statement on Text and Data Mining in the UK. The slides from that event are available and we published a blog post about the discussion.

We have also had discussions with different groups about this issue including the Future TDM project which has been looking to increase  the amount of TDM happening across Europe. This project is now finishing up. The impression we have around the sector is that ‘everyone wants to know what everyone else is doing’.

Symposium structure

With this general level of understanding of TDM as our base point, we structured the day to provide as much information as possible to the attendees. The Twitter hashtag for the event is #osctdm, and the presentations from the event are online.

The keynote presentation was by Kiera McNeice, from the FutureTDM Project who have an overview of what TDM is, how it can be achieved and what the barriers are. There is a video of her presentation (note there were some audio issues in the beginning of the recording).

The event broke into two parallel sessions after this. The main room was treated to a presentation about Wikimedia from Cambridge’s Wikimedian in Residence, Charles Matthews. Then Alison O’Mara-Eves discussed Managing the ‘information deluge’: How text mining and machine learning are changing systematic review methods. A video of Alison’s presentation is available.

In the breakout room, Dr Ben Outhwaite discussed Marriage, cheese and pirates: Text-mining the Cairo Genizah  before Peter Murray Rust spoke about ContentMine: mining the scientific literature.

After lunch, Rosemary Dickin from PLOS talked about Facilitating Test and Data Mining how an open access publisher supports TDM. PhD candidate Callum Court presented ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature. This presentation was filmed.

In the breakout room, a discussion about how librarians support TDM was led by Yvonne Nobis and Georgina Cronin. In addition there was a presentation from John McNaught –  the Deputy Director of the National Centre for Text and Data Mining (NaCTeM), who presented Text mining: The view from NaCTeM .

Round table discussion

The day concluded with the group reconvening together for a roundtable (which was filmed) to discuss the broader issue of why there is not more TDM happening in the UK.

We kicked off by asking each of the people who had presented during the event to describe what they saw as the major barrier for TDM. The answers ranged from the issue of recruiting and training staff to the legal challenges and policies needed at institutional level to support TDM and the failure of institutions and government to show leadership on the issue. We then opened up the floor to the discussion.

A librarian described what happens when a publisher cuts off access, including the process the library has to go through with various areas of the University to reinstate access. (Note this was the reason why the RLUK workshop concluded with the refrain: ‘Don’t cut us off!’). There was some surprise in the group that this process was so convoluted.

However, the suggestion that researchers let the library know that they want to do TDM and the library will organise permissions was rejected by the group, on both the grounds that it is impractical for researchers to do this, and that the effort associated with obtaining permission would take too long.

A representative from Taylor and Francis suggested that researchers contact the publishers directly and let them know. Again this was rejected as ‘totally impractical’ because of the assumption this made about the nature of research. Far from being a linear and planned activity, it is iterative and  to request access for a period of three months and to then have to go back to extend this permission if the work took an unexpected turn would be impractical, particularly across multiple publishers.

One attendee in her blog about the event noted: “The naivety of the publisher, concerning research methodology, in this instance was actually quite staggering and one hopes that this publisher standpoint isn’t repeated across the board.”

Some researchers described the threats they had received from publishers about downloading material. There was anger about the inherent message that the researcher had done something criminal.

There was also some concern raised that TDM will drive price increases as publishers see ‘extra value’ to be extracted from their resources. This sparked off a discussion about how people will experiment if anything is made digitally available.

During the hour long session the conversation moved from high level problems to workflows. How do we actually do this? As is the way with these types of events, it was really only in the last 10 minutes that the real issues emerged.  What was clear was something I have repeatedly observed over the past few years – that the players in this space including librarians, researchers and publishers, have very little idea of how the others work and their needs. I have actually heard people say: ‘If only they understood…’

Perhaps it is time we started having more open conversations?

Next steps

Two things have come out of this event. The first is that people have very much asked for some hands on sessions. We will have to look at how we will deliver this, as it is likely to be quite discipline specific.

The second is there is clearly a very real need for publishers, researches and librarians to get into a room together to discuss the practicalities of how we move forward in TDM. One of the comments on Twitter was that we need to have legal expertise in the room for this discussion. We will start planning this ‘stakeholder’ event after the summer break.

Feedback

The items that people identified as the ‘one most important thing’ they learnt was instructive. The answers reflect how unaware people are of the tools and services available, and of how access to information works. Many of the responses listed specific tools or services they had found out about, others commented on the opportunities for TDM.

There were many comments about publishers, both the bad:

  • Just how much impact the chilling effect of being cut off by publishers has on researchers
  • That researchers have received threats from publishers
  • Very interesting about publishers and ways of working with them to ensure not cut off
  • Lots can be done but it is being hindered by publishers

and the good:

  • That PLOS is an open access journal
  • That there are reasonable publishing companies in the UK
  • That journals make available big data for meta analysis

Commentary about the event

There has been some online discussion and blog posts on the event:

Published 17 August 2017
Written by Dr Danny Kingsley 
Creative Commons License