Tag Archives: librarians

Text and data mining services: an update

Text and Data Mining (TDM) is the process of digitally querying large collections of machine-readable material, extracting specific information and, by analysis, discovering new information about a topic.

In February 2017, a group University of Cambridge staff met to discuss “Text and Data Mining Services: What can Cambridge libraries offer?”  It was agreed that a future library Text and Data Mining (TDM) support service could include:

  • Access to data from our own collections
  • Advice on legal issues, what publishers allow, what data sets and tools are available
  • Registers on data provided for mining and TDM projects
  • Fostering agreements with publishers.

This blog reports on some of the activities, events and initiatives, involving libraries at the University of Cambridge, that have taken place or are in progress since this meeting (also summarised in these slides).  Raising awareness, educating, and teasing out the issues around the low uptake of this research process have been the main drivers for these activities.

March 2017: RLUK 2017 Conference Workshop

The Office of Scholarly Communication (OSC) and Jisc ran a workshop at the Research Libraries UK 2017 conference to discuss Research Libraries and TDM.  Issues raised included licencing, copyright, data management, perceived lack of demand, where to go for advice within an institution or publisher, policy and procedural development for handling TDM-related requests (and scaling this up across an institution) and the risk of lock-out from publishers’ content, as well as the time it can take for a TDM contract to be finalised between an institution and publisher.  The group concluded that it is important to build mechanisms into TDM-specific licencing agreements between institutions and publishers where certain behaviours are expected.  For example, if suspicious activity is detected by a publisher’s website, it would be better not to automatically block the originating institution from accessing content, but investigate this first (although this may depend on systems in place), or if lock-out happens and the activity is legal, participants suggested that institutions should explore compensation for the time that access is lost if significant.

July 2017: University of Cambridge Text and Data Mining Libguide

Developed by the eResources Team, this LibGuide explains about Text and Data Mining (TDM): what it is, what the legal issues are, what you can do and what you should not try to do. It also provides a list of online journals under license for TDM at the University of Cambridge and a list of digital archives for text mining that can be supplied to the University researchers on a disc copy. Any questions our researchers may have about a TDM project, not answered through the LibGuide, can be submitted to the eResources Team via an enquiry form.

July 2017: TDM Symposium

The OSC hosted this symposium to provide as much information as possible to the attendees regarding TDM.  Internal and external speakers, experienced in the field, spoke about what TDM is and what the issues are; research projects in which TDM was used; TDM tools; how a particular publisher supports TDM; and how librarians can support TDM.

At the end of the day a whole-group discussion drew out issues around why more TDM is not happening in the UK and it was agreed that there was a need for more visibility on what TDM looks like (e.g. a need for some hands-on sessions) and increased stakeholder communication: i.e. between publishers, librarians and researchers.

November 2017: Stakeholder communication and the TDM Test Kitchen

This pilot project involves a publisher, librarians and researchers. It is providing practical insight into the issues arising for each of the stakeholders: e.g. researchers providing training on TDM methods and analysis tools, library support managing content accessibility and funding for this, and content licencing and agreements for the publisher. We’ll take a more in-depth look at this pilot in an upcoming blog on TDM – watch this space.

January 2018: Cambridge University Library Deputy Director visits Yale

The Yale University Library Digital Humanities Laboratory provides physical space, resources and a community within the Library for Yale researchers who are working with digital methods for humanities research and teaching. In January this year Dr Danny Kingsley visited the facility to discuss approaches to providing TDM services to help planning here. The Yale DH Lab staff help out with projects in a variety of ways, one example being to help researchers get to grips with digital tools and methods.  Researchers wanting to carry out TDM on particular collections can visit the lab to do their TDM: off-line discs containing published material for mining can be used in-situ. In 2018, the libraries at Cambridge have begun building up a collection of offline discs of specific collections for the same purpose.

June 2018: Text and Data Mining online course

The OSC collaborated with the EU OpenMinTeD project on this Foster online course: Introduction to Text and Data Mining.  The course helps a learner understand the key concepts around TDM, explores how Research Support staff can help with TDM and there are some practical activities that even allow those with non-technical skills try out some mining concepts for themselves.  By following these activities, you can find out a bit more about sentence segmentation, tokenization, stemming and other processing techniques.

October 2018: Gale Digital Scholar Lab

The University of Cambridge has trial access to this platform until the end of December: it provides TDM tools at a front end to digital archives from Gale Cengage.  You can find out more about this trial in this ejournals@cambridge blog.

In summary…

Following the initial meeting to discuss research support services for TDM, there have been efforts and achievements to raise awareness of TDM and the possibilities it can bring to the research process as well as to explore the issues around the low usage of TDM in the research community at large.  This is an on-going task, with the goal of increased researcher engagement with TDM.

Published 23 October 2018
Written by Dr Debbie Hansen
Creative Commons License

Libraries’ role in teaching the research community – LILAC2017

Recently Claire Sewell, the OSC Research Support Skills Coordinator attended her first LILAC conference in Swansea. These are her observations from the event.

LILAC (Librarians’ Information Literacy Annual Conference) is one of the highlights of the information profession calendar which focuses on sharing knowledge and best practice in the field of information literacy. For those who don’t know information literacy is defined as:

Knowing when and why you need information, where to find it and how to evaluate, use and communicate it in an ethical manner (CILIP definition)

Showcasing OSC initiatives

Since it was my first time attending it was a privilege to be able to present three sessions on different aspects of the work done in the OSC. The first session I ran was an interactive workshop on teaching research data management using a modular approach. The advantage of this is that the team can have several modules ready to go using discipline specific examples and information, meaning that we are able to offer courses tailored to the exact needs of the audience. This works well as a teaching method and the response from our audience both in Cambridge and at LILAC was positive.

There was an equally enthusiastic response to my poster outlining the Supporting Researchers in the 21st Century programme. This open and inclusive programme aims to educate library staff in the area of scholarly communication and research support. One element of this programme was the subject of my finalLILAC contribution – a short talk on the Research Support Ambassador Programme which provides participants with a chance to develop a deeper understanding of the scholarly communication process.

As well as presenting and getting feedback on our initiatives the conference provided me with a chance to hear about best practice from a range of inspiring speakers. A few of my highlights are detailed below.

Getting the message out there -keynote highlights

Work openly, share ideas and get out of the library into the research community were the messages that came out of the three keynote talks from across the information world.

The first was delivered by Josie Fraser, a Social and Educational Technologist who has worked in a variety of sectors, who spoke on the topic of The Library is Open: Librarians and Information Professionals as Open Practitioners.  Given the aim of the OSC to promote open research and work in a transparent manner this was an inspiring message.

Josie highlighted the difference between the terms free and open, words which are often confused when it comes to educational resources.  If a resource is free it may well be available to use but this does not mean users are able to keep copies or change them, something which is fundamental for education.

Open implies that a resource is in the public domain and can be used and reused to build new knowledge. Josie finished her keynote by calling for librarians to embrace open practices with our teaching materials. Sharing our work with others helps to improve practice and saves us from reinventing the wheel. The criteria for open are: retain, reuse, revised, remix, redistribute.

In her keynote, Making an Impact Beyond the Library and Information Service, Barbara Allen talked about the importance of moving outside the library building and into the heart of the university as a way to get information literacy embedded within education rather than as an added extra. The more we think outside the library the more we can link up with other groups who operate outside the library, she argued. Don’t ask permission to join in the bigger agenda – just  join in or you might never get there.

Alan Carbery in his talk Authentic Information Literacy in an Era of Post Truth  discussed authentic assessment of information literacy. He described looking at anonymised student coursework to assess how students are applying what they have learnt through instruction. When real grades are at stake students will usually follow orders and do what is asked of them.

Students are often taught about the difference between scholarly and popular publications which ignores the fact that they can be both. Alan said we need to stop polarising opinions, including the student concept of credibility, when they are taught that some sources are good and some are bad. This concept is becoming linked to how well-known the source is – ‘if you know about it it must be good’. But this is not always the case.

Alan asked: How can we get out of the filter bubble – social media allows you to select your own news sources but what gets left out? Is there another opinion you should be exposed to? He gave the example of the US elections where polls and articles on some news feeds claimed Clinton was the frontrunner right up until the day of the election. We need to move to question-centric teaching and teach students to ask more questions of the information they receive.

Alan suggested we need to embed information literacy instruction in daily life – make it relevant for attendees. There are also lessons to be learnt here which can apply to other areas of teaching. We need to become information literacy instructors as opposed to library-centric information literacy instructors.

Key points from other sessions

There is a CILIP course coming soon on ‘Copyright education for librarians’. This will be thinking about the needs of the audience and relate to real life situations. New professional librarians surveyed said that copyright was not covered in enough depth during their courses however many saw it as an opportunity for future professional development. The majority of UK universities have a copyright specialist of some description, but copyright is often seen as a problem to be avoided by librarians.

There is a movement in teaching to more interactive sessions rather than just talking and working on their own. Several sessions highlighted the increased pressure on and expectations of students in academia. Also highlighted were the benefits of reflective teaching practice.

There are many misconceptions about open science and open research amongst the research community. There is too much terminology and it is hard to balance the pressure to publish with the pressure to good research. Librarians have a role in helping to educate here. Many early career researchers are positive about data sharing but unsure as to how to go about it, and one possibility is making course a formal part of PhD education.

Claire Sewell attended the LILAC conference thanks to the support of the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

Published 27  April 2017
Written by Claire Sewell 

Creative Commons License

Where did they come from? Educational background of people in scholarly communication

Scholarly communication roles are becoming more commonplace in academic libraries around the world but who is actually filling these roles? The Office of Scholarly Communication in Cambridge recently conducted a survey to find out a bit more about who makes up the scholarly communication workforce and this blog post is the first in a series sharing the results.

The survey was advertised in October 2016 via several mailing lists targeting an audience of library staff who worked in scholarly communication. For the purposes of the survey we defined this as:

The process by which academics, scholars and researchers share and publish their research findings with the wider academic community and beyond. This includes, but is not limited to, areas such as open access and open data, copyright, institutional repositories and research data management.

In total 540 people responded to the calls for participation with 519 going on to complete the survey, indicating that the topic had relevance for many in the sector.

Working patterns

Results show that 65% of current roles in scholarly communication have been established in respondent’s organisations for less than five years with fewer than 15% having been established for more than ten years. Given that scholarly communication is still growing as a discipline this is perhaps not a surprising result.

It should also be noted that the survey makes no distinction between those who are working in a dedicated scholarly communication role and those who may have had additional responsibilities added to a pre-existing position. These roles tend to sit within larger organisations which employ over 200 people although whether the organisation was defined as the library or wider institution was open to interpretation by respondents.

Responses showed an even spread of experience in the library and information science (LIS) sector with 22% having less than five years’ experience and 27% having more than twenty.  Since completing their education just over half of respondents have remained within LIS but given the current fluctuations in the job market it is not surprising to learn that just under half of people have worked outside the sector within the same period.

Respondents were also asked to list the ways in which they actively contributed to the scholarly publication process. The majority (72%) did so by authoring scholarly works or contributing to the peer review process (44%). Although not specified as a category a number of respondents highlighted their work in publishing material, indicating a change in the scholarly process rather than a continuation to the status quo.

LIS qualifications

Most of those (71%) who responded to the survey either have or are currently working towards a postgraduate qualification in LIS, an anticipated result given the target population of the survey. The length of time respondents had held their qualification was evenly spread in line with the amount of time spent working in the sector with 48% having achieved their qualification less than ten years ago whilst 49% having held their qualification for over a decade. Just over half of this group felt that their LIS qualification did not equip them with knowledge of the scholarly communication process (56%).

Around a fifth of respondents (21%) hold a library and information science qualification at a level other than postgraduate, with the majority of being at bachelor level. Of these there was a fairly even divide between those who have held this qualification for five to ten years (31%) and those who qualified more than twenty years ago (28%). Only 17% of this group felt that their studies equipped them with appropriate knowledge of scholarly communication.

Qualifications outside LIS

A small number of respondents do not hold qualifications in LIS but hold or are working towards postgraduate qualifications in other subjects. Most of this group hold/are working on a PhD (69%) in a range of subjects from anatomy to mechanical engineering.

This group overwhelmingly felt that what they learnt during their studies had practical applications in their work in scholarly communication (74%). This was a larger percentage than those who had studied LIS at either undergraduate or postgraduate level. These results echo experiences at Cambridge where a large proportion of the team is made up of people from a variety of academic backgrounds. In many ways this has proven to be an asset as they have direct experience of the issues faced by current researchers and are able to offer insight into how best to meet their needs.

So what does this tell us?

The scholarly communication workforce is expanding as academic libraries respond to the changing environment and shift their focus to research support. Many of these roles have been created in the past five years in particular within larger organisations better positioned to devote resources to increasing their scholarly communication presence.

Although results from this survey indicate that the majority of staff come from a library background a diverse range of levels and subjects are represented. As noted above this can provide unique insights into researcher needs but it also raises the question of what trained library professionals can bring to this area. Given that the majority of those educated in LIS felt that their qualification did not adequately equip them for their role this is a potentially worrying trend which needs to be explored further.

We will be continuing to analyse the results of the survey over the next few months to address both this and other questions. Hopefully this will provide insight into where scholarly communications librarians are now and what they can do to ensure success into the future.

Published 9 March 2017
Written by Claire Sewell
Creative Commons License