All posts by Office of Scholarly Communication

Cambridge Data Champions – reflections on an expanding community and strategies for 2019

The Cambridge Data Champions (DCs) advocate good Research Data Management (RDM) and Open Data practices to researchers locally in their departments, within Cambridge University in general, and sometimes further afield. They network with one another, exchange good methods of RDM, share ideas and, as a collective, reflect on current issues surrounding RDM, Open Data and researcher engagement, where a major shared goal is to establish best practices when it comes to research data. By attending bi-monthly forums facilitated by the Research Data Team, the DCs convene as a community, hear speakers presenting on relevant topics, and engage in workshops that will help them in their ‘championing’ activities. Following up from our latest blog which summarised how a workshop led to the creation of cartoon postcards as a new tool to add to the DCs’ resource kit for RDM advocacy, we are now reflecting on initiatives that sprung from workshops during the past year and are considering the challenges and opportunities that this programme brings as it approaches the end of its third year. 

Growing 

The programme started in Autumn 2016, comprising researchers who volunteered to become local community experts and advocate on research data management and sharing. Our first call welcomed 43 DCs (September 2016), our second call 20 DCs (March 2018) and the third call 40 DCs (January 2019). For simplicity, this year we also added to our statistics the “affiliate” DCs, who are colleagues who contribute to the DC community in other ways (as interested members of Cambridge’s RDM Project Group) and not necessarily through channelling their RDM efforts for the benefit of a specific department.

We are now a community comprised of 87 active DCs. 

Graph showing number of Data Champions (current and alumni) per year between 2016 and 2019.
Total number of Data Champions who joined in each year (orange column indicates Champions who are still active; blue column indicates Champions who are now alumni).

Communities within a community 

Over the last year we caught ourselves using words such as the ‘old DCs’ and the ‘new DCs’ and what we really meant was ‘established DCs’ and ‘new DCs’, with the latter group being those joining the programme each year. In September we celebrate the programme’s third birthday and it is reasonable to expect that there will be more experienced DCs who have already built their networks and have, more or less, a stable offering of RDM support and an enhanced understanding of the needs of their department. On the other hand, there are those who are being welcomed into the group who seek, to differing degrees, initial support from both the RDM team and their fellow colleagues in order to become successful DCs. It is easy to imagine that different layers are being developed with different needs, both in terms of support and engagement.  

Through various activities and feedback from DCs, we now have a good quantity of raw data to analyse their needs for being, as we called it, ‘a good Data Champion’. We have brainstormed ideas which we are putting into action to respond to the challenges of an ever-growing Data Champions group. 

Planning  

DC Welcome Pack 

Word cloud image of "welcome" in different languages  - front page of the Data Champion Welcome pack.

Every year we circulate the Data Champions Welcome Pack to coincide with the inductions we organise to welcome new DCs into the group. This year we included in the pack what it is expected from a DC when s/he joins the programme so that expectations are clearly communicated from the beginning and are the same for everybody. 

Document describing what Data Champions are expected to do as part of the Programme.
Page from the Cambridge Data Champions Welcome Pack

Bi-monthly forums 

Lightning talks have been introduced as a standard item in each forum. These have provided DCs with the opportunity to discuss aspects of RDM they are working on (e.g. new tools and techniques), or to feed back to the group on DC activities undertaken in their departments and data-related events they have attended so that the whole group can benefit. Importantly, the lightning talks have been used by DCs to problem solve, where the collective knowledge and experience of DCs attending a forum has been harnessed to address particular challenges faced by individual DCs. This is where the community aspect of the programme truly shines. 

It is always a priority for us to invite speakers to forums who are external to the programme, reflecting the needs of both the new and established DCs. For example, Hannah Clements from Cambridge University’s Researcher Development Programme (RDP) spoke to the DCs at the January forum about mentoring, providing guidance on how support can be best delivered within the DC community. In the May forum, we had talks and discussions from a panel of experts working on different aspects of data archiving. The panellists came from across the University bringing a diversity of experience, grounded in clinical governance, computing, and more traditional archiving. These examples are just a couple of the themes that we have covered so far in the forums, which have been derived predominantly from information provided by (and the needs of) the DCs themselves. Additional topics that we plan to cover in future forums include issues surrounding reproducibility, IP and commercialisation, publishing and the impact of research data.  

Key aims of these forums are to not only facilitate networking between DCs but to also act as an arena for the transfer of knowledge along the ‘researcher pipeline’, from forum to DCs and from DCs to researchers in their departments.   

DC specialisation group 

As a community, we need to be able to map expertise internally and understand the make-up of such an organic group at any given moment. This makes it is easier to support each other and create collaborations, but also improves how we promote the programme externally.

Table showing specialisation categories and sub-categories for Data Champions
Areas of expertise amongst our Data Champions

This led to the formation of the DC specialisation group, consisting of one of us and six of the DCs, which determined how to categorise expertise within the group. As a result, a spreadsheet was created where all DCs can chart their specialist areas and update or amend when necessary (and at least annually). We have top level categories for simple statistical analysis and second level categories that offer more specific details for the benefit of the DC community. 

The next stage is to include the wider research community and improve how various stakeholders can reach the appropriate Data Champions for initial advice and support in RDM issues. One way to do this is by presenting more coherent and consistent specialisations on the Data Champions’ website, using the categories which we have already created for internal use within the group. This stage is due to begin this month and we hope to report on our efforts next year.  

Branding group 

A growing community is inevitably going to bring to the forefront various identity discussions. With this in mind, we formed a branding group to examine if a DC logo should be created to enhance the Data Champions’ visibility and raise their profile amongst their peers when advocating for RDM. A logo has been created and is going through various stages of approval before it will be released later this year. 

Pilot programme – Mentoring  

In February 2019, we initiated a pilot mentoring project as part of the induction process for the new DCs. The mentors are established DCs who have volunteered to support those new DCs wishing to take part in this pilot exercise. This followed on from our January forum where the benefits of mentoring for both mentees and mentors were outlined by Hannah Clements of RDP. At this forum, which preceded the University-wide call for new DCs, we also held a workshop where DCs were divided into three groups and asked three questions: what do you wish you knew when you first became a DC that you know now; what could you offer as mentors to the new DCs; how do you think the mentor-mentee system could work? The responses from DCs in the three groups informed the implementation, structure and aims of the mentoring pilot.  

Our aim is to learn from this project in close consultation with both mentors and mentees. We want to see if this process helps new DCs to establish themselves within their departments/institutes. Will it be effective? The findings will inform our steps for the following year. Watch this space! 

Fostering clusters within departments 

We have excellent examples of departments that promote their DCs within their institutions. A good example is the Chemistry department, which has a cluster of five DCs who work together in their advocacy. During this year’s call for new DCs, and with help from the Department Librarian, we used a targeted approach at advertising the DC Programme within the Department of Engineering. This was highly successful, resulting in ten new Data Champions from Engineering from various roles and Academic Divisions. They represent a hub with the local knowledge, experience and skills to assess their department’s needs and explore best approaches to support good RDM practices and Open Research, ones that are tailored to the discipline.  

Alumni community 

Heading toward the programme’s third birthday means that we are growing bigger but also that we are developing an alumni community as well. This is a different kettle of fish but it is on our radar to investigate how we can foster this distinct group and build a network that is not only Cambridge based but has a more national and even international outlook.  

Funding  

Let’s not forget that the DC programme consists of volunteers. We are in the process of seeking more funds to support this ever increasing community, to run expanding bimonthly forums, and to be able to offer grants to assist DCs in their endeavours. As an example, we supported one of the DCs, James Savage, to bring the programme to the international stage in November at the SCIDataCon 2018 in Botswana. He talked about the programme as well as his experience of being a DC. This resulted in James writing a paper together with Lauren Cadwallader, to be published soon in Data Science Journal (the accepted manuscript and associated data available now in Apollo, the Cambridge University institutional repository). 

An exciting year so far! 

During this third year of the DC programme the number of active DCs across the University of Cambridge has doubled. We can only anticipate it growing further each year, yet balanced by an expanding community of alumni DCs as, for example, DCs leave Cambridge. The DC community is inherently dynamic, as is the programme. Because of this, we always seek to respond and adapt to changing conditions in novel and beneficial ways while maintaining the programme’s core structure to provide strong foundations. This has been a period of reflection, organisation and anticipation, all required to drive the Data Champion programme forward and tackle current challenges effectively, as well as those that lie ahead – more on this to come soon!  

Written by Maria Angelaki and Dr Sacha Jones

Published 20 June 2019

License logo CCBY

Engagement, infrastructure and roles: themes at #ScholComm19

Dr Beatrice Gini, the Office of Scholarly Communication’s new Training Coordinator, recently attended the inaugural Scholarly Communication Conference at the University of Kent. In this post she reviews the main themes and discussions from the event.

ScholComm19 – a brand new conference, a supportive community, an inclusive space: what a treat for a newcomer to scholarly communication! Having recently started a job within the Office of Scholarly Communication, I had high expectations for this conference as an opportunity to learn a lot from fellow practitioners, and I was not disappointed. Sarah Slowe and the team at the University of Kent should be congratulated for their drive in starting up a new gathering that draws together all the different strands of Scholarly Communications, giving those working at the coalface a chance to get together and share best practice.

With the whole of Friday given over to lightning talks, there were too many speakers for me to do them justice individually, so instead I will attempt to summarise the major themes, as I understood them. The full conference programme can be found here.

Engaging researchers

Many of the speakers focused on the way we work with researchers. Hardly surprising, perhaps, as our jobs tend to involve as much advocacy and training as they do practical support. While at times this is a challenge, many have found ways to deliver our messages more effectively:

 

  • A personal touch – Cassie Bowman from London South Bank University was faced with a lack of researcher engagement, due to the limitations of the technological platform, the complex terminology, the conflicting demands of policies, the difficulties in correcting initial misunderstandings, and the researchers’ fear of getting it wrong. She overcame these not by commissioning large scale change, but through her own personal touch. Her one-to-one sessions are carefully tailored to each researcher and produce long-lasting changes in attitudes. She reaches people through posters and infographics, sprinkling on a little competition (for the highest download figures) to boost interest. Lucy Lambe also spoke on the benefits of one-to-one sessions, alongside workshops and advice on the web, for her publishing advice service for researchers at LSE.
  • A bit of fun – The Publishing Trap game is now well-known in ScholComm circles, but it was new to me, and I was blown away. It takes players through a cleverly-crafted path from PhD student to retired researcher and beyond – all the way to gravestone, in fact – replicating the emotional highs and lows of a research career. Most importantly, though, it asks players to make crucial decisions that spark discussions on Open Access, copyright, skills, and more. Why not organise a fun session to surprise those who may (crazily!) believe that copyright is boring?
  • Useful information – We need to deliver information that is trustworthy and useful. Kirsty Wallis (University of Greenwich) stressed the importance of over-preparing and tailoring sessions to the needs of the people in the room. Her talk gave a useful blueprint of how we could teach academics to ‘speak social media’ through a flexible and hands-on workshop. ‘We need to be a credible source of information’ – this was one aspect of Julie Baldwin’s (University of Nottingham) exploration of why academics ‘get copyright so copywrong’. Engaging researchers with copyright issues is more important than ever now, at a time of change in the law. The University of Kent’s Chris Morrison gave a whistle-stop tour of the history of copyright law, followed by a sneak preview of the way the law may change once the new EU directive is implemented (yes, Brexit did flash briefly on the screen at this point, but it should not have a significant impact on copyright decisions).

Compliance vs culture change

Ian Carter’s talk on the study he ran with JISC on Research Data Management and Sharing raised a strong theme, which was echoed in many of the discussions I had during breaks. His interviews with representatives from 34 institutions revealed that there is a tension in the way we attempt to engage researchers with RDM and open data: on the one hand we say ‘you must do this to receive money/progression/recognition’, on the other we say ‘doing this benefits science and the wider world’. My belief is that the former is likely to generate small, short term wins on compliance rates, but potentially generate resentment. The latter requires more advocacy, but it is likely to generate true buy-in from researchers. Dr Carter advocates that the second approach, which aims for culture change, is indeed the most likely to succeed in the long term. He throws a challenge to all of us when he reports that researcher engagement is variable, RDM leadership is often fragile, responsible staff can be isolated, and few institutions consider all important aspects in their strategies. There is hope, however. As repositories develop better functionality and we find better ways to evidence the benefits of RDM and open data, we may see this area of research support grow into new strengths.

Infrastructural headaches

Repositories are the bread-and-butter of any Open Access support team: they are wonderful digital treasure troves, opening up our university’s invaluable research to the world and preserving it in perpetuity… but at times they can cause tremendous headaches too! A number of speakers shared the challenges they faced, as well as their solutions, saving the rest of us a lot of time and paracetamol. While there is still a split between institutions on the issue of whether depositing in a repository is done by researchers or mediated by support staff, it looked to me as though the trend is towards self-deposit by academics, which will mean more and more of us require automated systems for checking and updating records.

  • Nicola Barnett focused on how staff at the University of Leeds deal with the need to update repository records after they are officially published, for instance to set the correct embargo deadlines. She shared a useful set of instructions to automatically generate a list of recently published publications using Excel and a CrossRef API.
  • The diversity of publishers’ policies was arguably the greatest time-consuming hurdle in Suzanne Atkins’ work on making more monographs Open Access at the University of Birmingham. She ran a very successful pilot project to open up book chapters from one department, which had a glut of materials that could be made instantly OA, if the authors consented. While this work was very worthwhile and likely puts the team ahead when it comes to the next REF, it was hindered by the need to check every single policy and by the publishers’ insistence on relying on case-by-case decision, rather than applying blanket policies.
  • If your current system is just not up to requirements, switching to a new one can be a good time investment in the long run, but it can come with its own demands. Catherine Parker and her team at the University of Huddersfield found this out when they had to manually migrate all previous records – a great feat that really brought out their community spirit and was accomplished in (only?) two and a half months of intensive work. Stuart Bentley from the University of Hull highlighted some of the challenges of switching to Worktribe, as well as considering the improved functionality in the new system.

Roles and time

Finally, several speakers examined the way teams are structured, often in the context of the age-old question of how to get it all done in the time we have.

  • Surveys run by Catherine Parker and Ian Carter revealed a great disparity in the size of the research support and data management staff between institutions, with teams varying in size from one to well over a dozen. Even the areas where they are employed vary, with most being in libraries, but some belonging to research strategy offices. Lone workers have the blessing and the curse of having to take on all aspects of the work, from maintaining the repository to liaising with faculty members and running training, while large teams can specialise their staff.
  • Jane Belger and Anne Lawson talked about their experience of sharing the role of Research and Open Access Librarian at the University of West England at Bristol. Having worked out the logistics of syncing schedules and the questions of when to divide up projects and when to collaborate, their main conclusion is that two people can be ‘more than the sum of their parts’.
  • The multiplicity of roles was evident both in the talks and in the chats during breaks. Almost every speaker gave an introduction to their institution, which was key to understanding their perspective. A case in point was from Isabel Benton, from Leeds Arts University. She highlighted the peculiar challenges of working at a place where as many as 43% of outputs are in non-traditional format such as art show or exhibition: how do you capture those in a repository? (Hint: with a creative mix of media, check out the repository to know more.

*****

There was lots to think about on the train home. The overwhelming feeling, though, was of a community that genuinely cares about doing our very best to support researchers, and is dedicated to helping each other, both within institutions and beyond.

Published 30 May 2019
Written by Dr Beatrice Gini
Creative Commons License

Having Information to Hand: Research Support Handy Guides

If there is one thing I’ve learnt over the last few years of training library staff it’s that they really love a handout! Whether it contains extra information or a copy of the slides, in print or as a digital document, they really want something tangible to take away from a training session and refer back to. However I’m also a realist and I know that many of these handouts end their lives in a desk drawer never to be seen again so I wanted to create something that would be both attention grabbing and useful. Our series of Research Support Handy Guides were born as a result.

These short, four page guides are designed to be used as mini-booklets which summarise complex topics related to scholarly communication in an accessible way. They all follow a fairly consistent format with an eye-catching cover, a short synopsis of the topic, a list of factors to consider and links to further information. Having a page limit means that only the most important information can be included and this forces me to think about what people really need to know about a topic. It also means that I need to use clear language rather than lots of text which really helps me to distill a topic to its most important point. Although the guides are aimed at library staff we have discovered that they have other uses. All of the guides are made available under a CC-BY 4.0 licence on our website so that people can adapt the information as needed and we have added downloadable versions upon popular demand. Library staff are able to print these out or add them to their own websites as resources for researchers which saves them time having to come up with similar content from scratch and reinventing the wheel. The guides are also available via the online publication tool ISSUU which opens them up to a wider audience and makes them interactive. It doesn’t hurt that all of this provides a bit of stealth advocacy for the OSC either! I designed the guides using Canva. If you have never come across this site before I thoroughly recommend checking it out as it makes designing good looking materials really easy. I often have an idea in my head of how I want something to look but I can never quite seem to translate that to the (digital) page. Canva provides lots of support, graphics and importantly templates to help you create really engaging materials. I simple chose an appropriate template, uploaded some (CC0) images, edited the colours to reflect our palette and added the text.

So far there are eight guides in the series covering topics from data management plans to peer review. The guides are often created in direct response to a need identified by our library community – something that often happens when someone starts a sentence with the phrase “I wish I knew more about…” Some guides are created to tie in with an event such as Open Access Week or the recent Fair Use Week. One topic which is particularly suited to this format is copyright and there are currently three guides where it features heavily: Academic Social Networks, Anatomy of a Creative Commons License and the Fair Dealing Fact Sheet. This last title covers a topic that often causes confusion for both researchers and librarians and has been particularly useful to produce in our recent information sessions on copyright Based on the positive reaction I have received both in person and online I think more copyright related titles will definitely be added to the series!

If anyone else is thinking of using something similar I would definitely say give it a try. The guides have proved popular with both the Cambridge library community and those further afield and there have been over 3000 hits across all titles so far plus it’s always useful to have something ready to hand out at events or to point to when asked a question. Although much of the information has been adapted from existing information on our webpages the guides offer a much more accessible and visually appealing format that reading pages of dense text. There are lots of different design tools available to help and of course you might just have more talent than me! Creating something that looks professional is surprisingly easy and can really help to engage users in complex topics and potentially be used as a way to start a longer conversation – and you never know where that might lead.

Published 19 March 2019
Written by Claire Sewell
Creative Commons License

This blog was originally published on UK Copyright Literacy, 15 March 2019

Where are we now? Cambridge theses deposits one year in

As the nights draw in and the academic year 2018/19 begins, we are preparing to enter our second year of compulsory e-theses deposits. Our university repository, Apollo, is close to holding 6000 digital PhD theses and it is the intention of the University that this valuable research asset continues to grow into the future. The Apollo repository will play a large part in making this happen. Until recently only hardbound copies of theses were collected and catalogued by the University Library. Users could read theses on-site in Cambridge or order a digitisation of the thesis, but the introduction of e-thesis deposit to Apollo has meant that University of Cambridge theses are more accessible than ever before. It’s been an incredibly busy year and we have made some great steps forward in our management of theses in Cambridge.

e-theses at Cambridge – the background

The e-theses deposit story at Cambridge started in October 2016, when the Office of Scholarly Communication upgraded Apollo to allow the deposit of theses and began a digital thesis pilot for the academic year 2016/17. 11 departments in the University participated in the pilot, asking their PhD students to deposit an e-thesis alongside a hardcopy thesis. Theses deposited in Apollo during the pilot could either be made open access on request of the author or were treated as historical theses had been up until that point, whereby hardbound copies were held in the University Library and requestors could sign a declaration stating they wish to consult a thesis for private study or non-commercial research. Following the success of the pilot, the Board of Graduate Studies, at its meeting on 4 July 2017, made the decision that from 1 October 2017 all PhD students would be required to deposit both a hard copy and an electronic copy of their thesis to the University Library.

What we learnt during the academic year 2017/18

The experience of depositing theses during the pilot had highlighted some issues that needed addressing. We had to make decisions on how to deal with third party copyright, sensitive material, library copy and supply rules, and the alignment of access levels for hardbound and electronic theses. In response to this, we decided that we should think through each of the different ways in which a thesis could be deposited in the repository, and consider the range of contentious material that could be contained within a thesis.

How do theses enter the repository?

Whilst students that are depositing in order to graduate do this directly, we also have the capacity to scan theses on request here in the library, and these scanned theses are subsequently deposited in Apollo. In addition to this, we led a drive to digitise University of Cambridge theses held by the British Library on microfilm and gave alumni the option to digitise their thesis and make it open access at no cost to them.

British Library theses

This year the OSC has made a bulk deposit of theses scanned by the British Library, which significantly augments the number of theses stored in the repository. In the culmination of a two-year project, nearly 1300 additional Cambridge PhD theses are now available on request in the Apollo repository.

Prior to being made available in the repository, these Cambridge theses were held on microfilm at the British Library. They date from the 1960s through to 2008, when digitisation took over from microfilm as a means of document storage. The British Library holds 14,000 Cambridge PhD theses on microfilm; in 2016 they embarked on a project with the OSC to digitise ten percent of the collection at low cost – read more about this in an earlier post, Choosing from a cornucopia: a digitisation project.

You can explore the collection in Apollo: Historical Digital Theses: British Library collection.  The theses are under controlled access, which means they are available on request for non-commercial research purposes, subject to a £15 admin fee.

Establishing access levels

We established that the level of access we could allow to the thesis could be determined by the route a thesis entered the repository, its content, or in some cases the author’s wish to publish. To address all of the potential issues, we decided to define a set of access levels which would determine what we, as managers of the repository, were able to do with a thesis and the way in which it could be accessed by a requestor.

The access levels were put in action in spring 2018 and this was followed by a survey of Degree Committees, conducted by the e-theses working group consisting of members of the University Library and Student Registry. The survey asked for feedback on the suitability of the access levels for research outputs for all departments in the University; the outcome confirmed that the access levels were working and covered the options well, although a few tweaks were needed. In light of the feedback, a set of recommendations was put to the Board of Graduate Studies by the e-theses working group, and these recommendations were considered and accepted at their meeting on 3 July 2018, ready to be put in place for the 2018/19 academic year.

eSales for theses under controlled access

At the same time as we were establishing our access levels, we were also working on devising an eSales process to facilitate the supply of theses under controlled access. Controlled access replicates the way that historical, hardbound theses were managed in the library, with the addition of an electronic version of the thesis being held in the repository, and follows the library copy and supply rules for unpublished works under copyright law. A thesis scanned by the library would be deposited under controlled access so it remains unpublished, but this access level is also available to students depositing their thesis directly. The eSales process we devised went live in July 2018 and this meant a large number of theses held in the repository were made more accessible, including those digitised by the British Library. As of 18 October, we have supplied 14 theses via the eSales route and the requests keep coming in at a steady pace.

Looking forward to the 2018/19 academic year

As we begin the 2018/19 academic year, our theses management is looking in good shape but we will continue to improve and refine our internal and external services. In consultation with the University’s Student Registry we are making the final changes to our deposit forms, access levels and communications and we endeavour to make this academic year the smoothest yet for e-theses management. University of Cambridge theses are more accessible than they have ever been. The collection will grow as more students deposit each year, and the valuable research of PhD students will continue to be disseminated.

Published 25 October 2018
Written by Zoë Walker-Fagg
Creative Commons License

What do you want, and why do you want it? An update on Request a Copy

 As part of Open Access Week 2018, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Dr Mélodie Garnier provides some new insights into our Request a Copy service.

4,416. This is the number of requests for copies of material in our repository we’ve received over the past 12 months. Daunting, isn’t it? And definitely on the rise, with a 33% increase from the previous year. Two years and a half after its implementation in June 2016, our Request a Copy service is now more popular than ever. Our institutional repository Apollo hosts thousands of freely available research outputs, but also many that are under embargo.  People from all over the world and from all walks of life are keen to access them. But what exactly do requesters want? And why do they want it?

What do people want?

Our repository hosts a whole range of research outputs, but theses and journal articles are by far the most popular. Interestingly, the relative proportion of requested theses vs requested articles has shifted this year. From October 2016 to October 2017, requests for journal articles made up 56% of the total number of requests, and requests for theses made up 39%. Since last October, requests for journal articles have accounted for 38% of the total while theses have accounted for 59%.

Looking at the raw figures, the number of requests for journal articles has actually gone up (from 1,647 to 1,689), though only slightly. But the number of thesis requests has more than doubled, going from 1,145 to 2,586. This is partly explained by the University of Cambridge’s requirement for PhD students to upload their theses from 1st October 2017, leading to 1,279 new theses uploads. On top of these, we have added around 1,300 historical British Library theses and around 200 scanned historical theses from the Digital Content Unit. So between 2,500 and 3,000 theses have been added to Apollo this year alone (more on this tomorrow for #ThesisThursday).

Most wanted

Most items requested this year were only requested once, but 28 items were requested 10 times or more. Of the 20 most requested items, four are journal articles and 16 are PhD theses. Here’s our top 5:

Aside from the gold medal winner, all the other works were published this year and have only been available in Apollo for a few months. So it is striking to see how popular some of them have become in quite a short period of time. A case in point is the zoology article, which was deposited in Apollo only last month and first published shortly afterwards.

Word of mouth

Though it is sometimes unclear why particular outputs suddenly attract a lot of requests, Altmetric Attention scores can be telling – see the one below for the zoology article I’ve just mentioned:

Another interesting example (not included in the top 5) is a PhD thesis deposited in Apollo at the end of August. From 18 in September, the Apollo record has gone up to an astounding number of 911 visits in October (and counting), with a surge of requests. What happened in between? The author publicised her thesis on a Facebook society page, pointing to the repository record link for access.

We only became aware of this as requesters explicitly referred to that page, but it’s possible that similar things happen a lot of the time. So aside from traditional media outlets, the influence of social media on number of requests received can be quite dramatic, and probably greater than we could ever capture.

Tell us about yourself

When requesting a copy of an embargoed article or thesis, people are prompted to leave a message alongside contact details. This is so they can introduce themselves and explain why they are interested in accessing the work, mainly so that authors can make informed decisions on whether to accept or reject requests. Quite often these messages have little to no useful information, but some can be informative in a number of ways.

Through them we can get a glimpse of the range of people accessing the repository – their geographical provenance, background and professional occupation. We can also get a sense of the range of interests that people have (which may appear very specialised, if not a little obscure). And crucially they tell us what people want to do with the research – whether use it as reference, apply it in their professional sphere or simply read it for pleasure.

Why do people request work?

Broadly speaking, people request work in Apollo for the following purposes: reference/citation, personal interest/leisure, replication of results for research purposes, and need to inform professional practice. But those broad categories can include several sub-categories, for example personal interest can stem from hearing about the research in the media or knowing the author.

Getting the full detailed account of why people request work from our repository would require going through messages individually, and perhaps some degree of subjective judgement. Since launching the Request a Copy service we’ve had over 8,000 requests – so even if uninformative messages were excluded, the analysis could be fairly time-consuming. But certainly worth exploring, so watch this space.

Just a snippet…

What better way to advocate for Open Access than to show concrete examples of how research can impact on individual lives? Our Open Access team sees evidence of this every day through Request a Copy messages. So until we can offer a full-blown analysis of the output, let’s conclude this blog post with a selection of favourites:

  • “Our daughter is being investigated for Beckwith Wiedemann Syndrome. We would like as much information as possible about this area”
  • “I’m a pediatric radiation oncologist and this paper is a “practice changer” one!”
  • “My task is to convince policy makers in Sri Lanka to switch to circular economy. I am looking for all possible information to do this”
  • “I work in FE/HE and have a number of students experiencing/ or diagnosed with psychosis, I am very interested in intervention research and programmes for psychosis that can be implemented within our college environment”
  • “I would like a copy of this material for inspiring my high school students of physics”
  • “I hope to learn more about the potential risks of my decision to donate a kidney”

Although there is a definite cost to running Request a Copy in terms of staff time, it is clear how popular and valuable a service it has become. As its popularity increases so does the need for process efficiency, however. This is currently a big priority for us and something we’ll have to keep working on, but we think the benefits for researchers and the wider community are worth it.

Published 24 October 2018
Written by Dr Mélodie Garnier
Creative Commons License

Text and data mining services: an update

Text and Data Mining (TDM) is the process of digitally querying large collections of machine-readable material, extracting specific information and, by analysis, discovering new information about a topic.

In February 2017, a group University of Cambridge staff met to discuss “Text and Data Mining Services: What can Cambridge libraries offer?”  It was agreed that a future library Text and Data Mining (TDM) support service could include:

  • Access to data from our own collections
  • Advice on legal issues, what publishers allow, what data sets and tools are available
  • Registers on data provided for mining and TDM projects
  • Fostering agreements with publishers.

This blog reports on some of the activities, events and initiatives, involving libraries at the University of Cambridge, that have taken place or are in progress since this meeting (also summarised in these slides).  Raising awareness, educating, and teasing out the issues around the low uptake of this research process have been the main drivers for these activities.

March 2017: RLUK 2017 Conference Workshop

The Office of Scholarly Communication (OSC) and Jisc ran a workshop at the Research Libraries UK 2017 conference to discuss Research Libraries and TDM.  Issues raised included licencing, copyright, data management, perceived lack of demand, where to go for advice within an institution or publisher, policy and procedural development for handling TDM-related requests (and scaling this up across an institution) and the risk of lock-out from publishers’ content, as well as the time it can take for a TDM contract to be finalised between an institution and publisher.  The group concluded that it is important to build mechanisms into TDM-specific licencing agreements between institutions and publishers where certain behaviours are expected.  For example, if suspicious activity is detected by a publisher’s website, it would be better not to automatically block the originating institution from accessing content, but investigate this first (although this may depend on systems in place), or if lock-out happens and the activity is legal, participants suggested that institutions should explore compensation for the time that access is lost if significant.

July 2017: University of Cambridge Text and Data Mining Libguide

Developed by the eResources Team, this LibGuide explains about Text and Data Mining (TDM): what it is, what the legal issues are, what you can do and what you should not try to do. It also provides a list of online journals under license for TDM at the University of Cambridge and a list of digital archives for text mining that can be supplied to the University researchers on a disc copy. Any questions our researchers may have about a TDM project, not answered through the LibGuide, can be submitted to the eResources Team via an enquiry form.

July 2017: TDM Symposium

The OSC hosted this symposium to provide as much information as possible to the attendees regarding TDM.  Internal and external speakers, experienced in the field, spoke about what TDM is and what the issues are; research projects in which TDM was used; TDM tools; how a particular publisher supports TDM; and how librarians can support TDM.

At the end of the day a whole-group discussion drew out issues around why more TDM is not happening in the UK and it was agreed that there was a need for more visibility on what TDM looks like (e.g. a need for some hands-on sessions) and increased stakeholder communication: i.e. between publishers, librarians and researchers.

November 2017: Stakeholder communication and the TDM Test Kitchen

This pilot project involves a publisher, librarians and researchers. It is providing practical insight into the issues arising for each of the stakeholders: e.g. researchers providing training on TDM methods and analysis tools, library support managing content accessibility and funding for this, and content licencing and agreements for the publisher. We’ll take a more in-depth look at this pilot in an upcoming blog on TDM – watch this space.

January 2018: Cambridge University Library Deputy Director visits Yale

The Yale University Library Digital Humanities Laboratory provides physical space, resources and a community within the Library for Yale researchers who are working with digital methods for humanities research and teaching. In January this year Dr Danny Kingsley visited the facility to discuss approaches to providing TDM services to help planning here. The Yale DH Lab staff help out with projects in a variety of ways, one example being to help researchers get to grips with digital tools and methods.  Researchers wanting to carry out TDM on particular collections can visit the lab to do their TDM: off-line discs containing published material for mining can be used in-situ. In 2018, the libraries at Cambridge have begun building up a collection of offline discs of specific collections for the same purpose.

June 2018: Text and Data Mining online course

The OSC collaborated with the EU OpenMinTeD project on this Foster online course: Introduction to Text and Data Mining.  The course helps a learner understand the key concepts around TDM, explores how Research Support staff can help with TDM and there are some practical activities that even allow those with non-technical skills try out some mining concepts for themselves.  By following these activities, you can find out a bit more about sentence segmentation, tokenization, stemming and other processing techniques.

October 2018: Gale Digital Scholar Lab

The University of Cambridge has trial access to this platform until the end of December: it provides TDM tools at a front end to digital archives from Gale Cengage.  You can find out more about this trial in this ejournals@cambridge blog.

In summary…

Following the initial meeting to discuss research support services for TDM, there have been efforts and achievements to raise awareness of TDM and the possibilities it can bring to the research process as well as to explore the issues around the low usage of TDM in the research community at large.  This is an on-going task, with the goal of increased researcher engagement with TDM.

Published 23 October 2018
Written by Dr Debbie Hansen
Creative Commons License