Category Archives: Uncategorized

From data curators to intellectual entrepreneurs: observations from IFLA

In this blog post, Clair Castle, Librarian, University of Cambridge, Department of Chemistry reflects on her experience at the IFLA Satellite Meeting 2017 in Warsaw, Poland.

Earlier this year I was invited by the Office of Scholarly Communication (OSC) at the University of Cambridge to present a paper on Data Curator’s Roles and Responsibilities: International and Interdisciplinary Perspectives. This was my first time writing a paper for a conference and presenting it; it was slightly daunting but exciting too!

IFLA is the International Federation of Library Associations and Institutions, the international body that represents the interests of library and information services and their users. It celebrates its 90th birthday in 2017. This conference was a pre-Congress Satellite Conference, taking place just before the IFLA World Library and Information Congress held in Wrocław, Poland, from 19–25 August.

There were three sessions of four presentations in the programme – which includes links to every presentation. You can find most of the papers that were presented here. The main conference hashtag on Twitter was #wlic2017 (learn more about the 2017 and upcoming 2018 congress by following @iflawic).

Conference focus

Data curation has emerged as a new area of responsibility for researchers, librarians, and information professionals in the digital environment. The huge variety and amount of data that needs to be processed, preserved, and disseminated is creating new roles, responsibilities and challenges for researchers and the library and information professionals who support them. The primary goal of the conference was to engage the international scholarly community in a conversation that led to a better understanding of these challenges, and to discuss the main trends in data curation and Research Data Management (RDM) practices and education.

To ‘curate’ means to ‘take care of’. What resonated with me the most from the conference was the fact that while we are curating data we are curating people as well. We are doing this by changing research culture, evolving the profession, changing research (and research support) practices, doing outreach and advocacy work, and liaising with related university support services. The conference presentations returned to this theme again and again.

I won’t discuss every presentation here, instead I will collate and relate the ideas that I found most thought-provoking.

Intellectual entrepreneurship

This term was introduced to me by Nitecki and Davis’ presentation ‘Expanding librarians’ roles in the research life cycle’. The definition I have since found that explains this the best is from Charles J. Chumas at Stony Brook University:

“Take … the textbook definition of entrepreneur: A person who organizes and manages any enterprise, especially a business, usually with considerable initiative and risk. Now, switch out the words “enterprise” and “business” with words such as “research” or “education”. This is the concept of intellectual entrepreneurship. It is the concept of taking risk, seizing opportunity, discovering and creating knowledge and employing one’s own innovation and strategies, with the ultimate goal of solving problems in corporate, societal or governmental environments. An intellectual entrepreneur … actively seeks out their own education … The philosophy of IE embodies four core values: vision and discovery, ownership and accountability, integrative thinking and action, and collaboration and teamwork”.

I feel that this describes the role of data curators exactly: researchers and the people supporting them are planning data curation strategically and innovatively, acquiring the necessary knowledge and skills to develop it in their institution, and working to bring systems, services and people together to achieve their overall goal of managing data effectively.

Zhang’s presentation ‘Data curators: A glimpse at their roles at the academic libraries in the United States’ mentioned the Association of Research Librarians’ Strategic Thinking and Design Initiative: Extended and Updated Report (2016) which estimates that the research librarian will have shifted from knowledge service provider to collaborative partner within the research ecosystem by 2033. In one example of this, librarians have shifted from providing a service support role to working with researchers to further open science: the FOSTER portal is an e-learning platform that brings together the best training resources addressed to those who need to know more about Open Science, or need to develop strategies and skills for implementing Open Science practices in their daily workflows. It provides training materials for many different users – from early-career researchers, to data managers, librarians, research administrators, and graduate schools. This reflects the self-education aspect of intellectual entrepreneurship.

Upskilling librarians

Many library science curricula around the world do not (yet) include an RDM module. Experienced librarians may not therefore have the necessary knowledge or skills to support RDM. Many data curation post advertisements require leadership, partnership, outreach and collaborative responsibilities but not a professional library qualification. Data curation posts have been repurposed from experienced librarian posts, taken up by new graduates, contractors, PhDs, or sometimes are joint appointments with different academic units. A review of the library profession with regard to RDM skills and knowledge is required to inform future education and training.

Peters’ presentation ‘Reskilling academic librarians for data management services’ highlighted Lewis’ research data management pyramid for libraries (p.16). Areas of early engagement with RDM are situated at the bottom of the pyramid, and as you get to the top you can take on the world!

Role of IT in data curation

Several speakers touched upon this: after all, IT underpins everything and IT support staff are often closer to researchers than librarians are. However, there may be a perception that data curation is not an IT role, per se. In another example of intellectual entrepreneurship, IT and data librarians can work together to provide research data support services: IT can bring UX (User Experience) skills e.g. design of systems, project management, and data librarians can bring their expertise in repository infrastructures, digital preservation, discovery and indexing methods for example.

The definition of data curation is evolving

The IFLA Library Theory and Research Panel Data Curation Project identified the role and responsibilities of data curators in international context. One aspect of the methodology was to undertake a review of literature and vocabulary describing data curation roles (using a cool keyphrase digger tool!), and analysing the content analysis of job advertisements (in 35 countries). They found varying terms to describe data curation (e.g. data stewardship, digital preservation, data science, and RDM, the preferred term). Outreach and advocacy to researchers was found to be an important aspect of roles, which again relates back to the theme of intellectual entrepreneurship.

Central vs. discipline-specific RDM activities at the University of Cambridge

As I have mentioned, I presented my paper on behalf of the OSC. Since its establishment in 2015 the OSC has developed many services to support RDM at the University, including a central website, RDM training and support, and a data repository. It communicates with researchers and support staff including librarians and administrators across the University using a variety of methods. There is therefore a considerable amount of outreach into departments and faculties where research takes place. However, its resources are limited: it is not possible for it to deliver RDM training for example in every department or faculty in the University, especially on a discipline-specific basis.

Most departments and faculties in the University have an embedded library service, which is discipline-specific. Librarians are in a key position to be able to collaborate with the OSC and their own researchers in developing and implementing RDM services locally. My paper presents a case study of how centralised RDM services have been rolled out in the Department of Chemistry, thus adapting the central RDM messages to discipline-specific needs. I describe how customising centralised RDM training to all new graduate students in the Department, being a member of the University’s RDM Project Group, and being involved in the OSC’s Data Champions programme has benefitted both the OSC and the Department.

Identity crisis?

The conference taught me that the identity of data curators is constantly evolving. Does it even matter what we call ourselves? Whatever the term used to describe us, we have similar roles and goals, and need to equip ourselves for future challenges. The concept of intellectual entrepreneurship is worth exploring further as a way of empowering ourselves.

The conference gave me a great opportunity to share and learn about RDM best practice from practitioners across the world. It reinforced for me the fact that we are all in it together, facing the same challenges and working together to come up with solutions.

Observations

The conference took place at the very impressive University of Warsaw Library, which is centrally located beside the Old Town in Warsaw, right next to the Vistula River. Around 40 delegates attended from all over the world.

Warsaw itself is a lively city, though with a rich, if at times tragic, history. After the conference dinner (a BBQ outside on a very warm evening!) we were treated to an entertaining evening bus tour around the city. We passed the amazing POLIN Museum of the History of Polish Jews, travelled through the area where the Warsaw Ghetto had been, and took in examples of communist era architecture (in particular the imposing Palace of Culture and Science).

        

Published 15 December 2017
Written by Clair Castle @chemlibcam
Creative Commons License

Engaging Researchers with Good Data Management: Perspectives from Engaged Individuals

We need to recognise good practice, engage researchers early in their career with research data management and use peers to talk to those who are not ‘onboard’. These were the messages five attendees at the Engaging Researchers in Good Data Management conference held on the 15th of November.

The Data Champions and Research Support Ambassadors programmes are designed to increase confidence in providing support to researchers in issues around data management and all of scholarly communications respectively. Thanks to the generous support of the Arcadia Foundation, five places were made available to attend this event. In this blog post the three Data Champions and two Research Support Ambassadors who were awarded the places give us the low-down on what they got out of the conference and how they might put what they heard into practise.

Recordings of the talks from the event can be found on the Cambridge University Library YouTube channel.

Financial recognition is the key

Dr Laurent Gatto, Senior Research Associate, Department of Biochemistry, University of Cambridge and Data Champion

As a researcher who cherishes good and reproducible data analysis, I naturally view good data management as essential. I have been involved in research data management activities for a long time, acting as a local data champion and participating in open research and open data events. I was interested in participating in this conference because it gathered data champions, stewards and alike from various British and European institutions (Cambridge, Lancaster, Delft), and I was curious to see what approaches were implemented and issues were addressed across institutions. Another aspect of data championship/stewardship I am interested in is the recognition these efforts offer (this post touches on this a bit).

Focusing on the presentations from Lancaster, Cambridge and Delft, it is clear that direct engagement from active researchers is essential to promote healthy data management. There needs to be an enthusiastic researcher, or somebody that has some experience in research, to engage with the research community about open data, reproducibility, transparency, security; a blunt top-down approach lead to limited engagement. This is also important due to the plurality of what researchers across disciplines consider to be data. An informal setting, ideally driven by researchers and, or in collaboration with librarians, focusing on conversations, use-cases, interviews, … (I am just quoting some successful activities cited during the conference) have been the most successful, and have sometime also lead to new collaborations.

Despite the apparent relative success of these various data championing efforts and the support that the data champions get from their local libraries, these activities remain voluntary and come with little academic reward. Being a data champion is certainly an enriching activity for young researchers that value data, but is comes with relatively little credit and without any reward or recognition, suggesting that there is probably room for a professional approach to data stewardship.

With this in mind, I was very interested to hear the approach that is currently in place at TU Delft, where data stewards hold a joint position at the Centre for Research Data and at their respective faculty. This defines research data stewardship as an established and official activity, allows the stewards to pursue a research activity, and, explicitly, links research data to research and researchers.

I am wondering if this would be implemented more broadly to provide financial recognition to data stewards/champions, offer incentives (in particular for early-career researchers) to approach research data management professionally and seriously, make data management a more explicit activity that is part of research itself, and move towards a professionalisation of data management posts.

Inspiration and ideas

Angela Talbot, Research Governance Officer, MRC Biostatistics Unit and Data Champion

Tasked with improving and updating best practice in the MRC Biostatistics Unit, I went along to this workshop not really knowing what to expect but hopeful and eager to learn.

Good data management can meet with resistance as while it’s viewed as an altruistic and noble thing to do many researchers worry that to make their research open and reproducible opens them to criticism and the theft of ideas and future plans. What I wanted to know are ways to overcome this.

And boy did this workshop live up to my expectations! From the insightful opening comments to the though provoking closing remarks I was hooked. All of the audience were engaged in a common purpose, to share their successes and strategies for overcoming the barriers that ensure this becomes best practice.

Three successful schemes were talked through: the data conversations in Lancaster, the Data Champion scheme at the University of Cambridge and the data stewards in TU Delft. All of these successful schemes had one thing in common: they all combine a cross department/ faculty approach with local expertise.

Further excellent examples were provided by the lightning talks and for me, it was certainly helpful to hear of successes in engaging researchers on a departmental level.

The highlight for me were the focus groups – I was involved in Laurent Gatto’s group discussing how to encourage more good data management by highlighting what was in to for researchers who participate but I really wish I could have been in them all as the feedback indicated they had given useful insights and tips.

All in all I came away from the day buzzing with ideas. I spent the next morning jotting down ideas of events and schemes that could work within my own unique department and eager to share what I had learnt. Who knows, maybe next time I’ll be up there sharing my successes!!

We need to speak to the non-converted

Dr Stephen Eglen, Reader in Computational Neuroscience, Department of Applied Mathematics & Theoretical Physics, University of Cambridge and Data Champion

The one-day meeting on Engaging Researchers in Good Data Management served as a good chance to remind all of us about the benefits, but also the responsibilities we have to manage, and share, data. On the positive side, I was impressed to see the diversity of approaches lead by groups around the UK and beyond. It is heartening to see many universities now with teams to help manage and share data.

However, and more critically, I am concerned that meetings like this tend to focus on showcasing good examples to an audience that is already mostly convinced of the benefits of sharing. Although it is important to build the community and make new contacts with like-minded souls, I think we need to spend as much time engaging with the wider academic community.   In particular, it is only when our efforts can be aligned with those of funding agencies and scholarly publishing that we can start to build a system that will give due credit to those who do a good job of managing, and then sharing, their data. I look forward to future meetings where we can have a broader engagement of data managers, researchers, funders and publishers.

I am grateful to the organisers to have given me the opportunity to speak about our code review pilot in Neuroscience. I particularly enjoyed the questions. Perhaps the most intriguing question to report came in the break when Dr Petra ten Hoopen asked me what happens if during code review a mistake is found that invalidates the findings in the paper? To which I answered (a) the code review is supposed to verify that the code can regenerate a particular finding; (b) that this is an interesting question and it would probably depend on the severity of the problem unearthed; (c) we will cross that bridge when we come to it. Dr ten Hoopen noted that this was similar to finding errors in data that were being published alongside papers. These are indeed difficult questions, but I hope in the relatively early days of data and code sharing, we err on the side of rewarding researchers who share.

Teach RDM early and often

Kirsten Elliott, Library Assistant, Sidney Sussex College, University of Cambridge and Research Support Ambassador

Prior to this conference, my experience with Research Data Management (RDM) was limited to some training through the Office of Scholarly Communication and Research Support Ambassadors programme. This however really sparked my interest and so I leapt at the opportunity to learn more about RDM by attending this event. Although at times I felt slightly out of my depth, it was fascinating to be surrounded by such experts on the topic.

The introductory remarks from Nicole Janz were a fascinating overview of the reproducibility crisis, and how this relates to RDM, including strategies for what could be done, for example setting reproducing studies as assignments when teaching statistics. This clarified for me the relationship between RDM and open data, and transparency in research.

There were many examples throughout the day of best practice in promoting good RDM, from the “Data Conversations” held at Lancaster University, international efforts from SPARC Europe and even some from Cambridge itself! Common ground across all of them included the necessity of utilising engaged researchers themselves to spread messages to other researchers, the importance of understanding discipline specific issues with data, and an expansive conception of what counts as “data”.

I am based in a college library and predominantly work supporting undergraduate students, particularly first years. In a way this makes it quite a challenge to present RDM practices as many of the issues are most obviously relevant to those undertaking research. However, I think there’s a strong argument for teaching about RDM from very early in the academic career to ingrain good habits, and I will be thinking about how to incorporate RDM into our information literacy training, and signposting students to existing RDM projects in Cambridge.

Use peers to spread the RDM message

Laura Jeffrey, Information Skills Librarian, Wolfson College, University of Cambridge and Research Support Ambassador

This inspirational conference was organised and presented by people who are passionate about communicating the value of open data and replicability in research processes. It was valuable to hear from a number of speakers (including Rosie Higman from the University of Manchester, Marta Busse-Wicher from the University of Cambridge and Marta Teperek from TU Delft) about the changing role of support staff, away from delivering training to one of coordination. Peers are seen to be far more effective in encouraging deeper engagement, communicating personal rather than prescriptive messages (evidenced by Data Conversations at Lancaster University). A member of the audience commented that where attendance is low for their courses, the institution creates video of researcher-led activities to be delivered at point of need.

I was struck by two key areas of activity that I could act on with immediate effect:

Inclusivity – Beth Montagu Hellen (Bishop Grosseteste) highlighted the pressing need for open data to be made relevant to all disciplines. Cambridge promotes a deliberately broad definition of data for this reason. Yet more could be done to facilitate this; I’ll be following @OpenHumSocSci to monitor developments. We’re fortunate to have a Data Science Group at Wolfson promoting examples of best practice. However, I’m keen to meet with them to discuss how their activities and the language they use could be made more attractive to all disciplines.

Communication – Significant evidence was presented by Nicole Janz, Stephen Eglen and others, that persuading researchers of the benefits of open data leads to higher levels of engagement than compulsion on the grounds of funder requirements. This will have a direct impact on the tone and content of our support. A complimentary approach was proposed: targeted campaigns to coincide with international events in conjunction with frequent, small-scale messages. We’ll be tapping into Love Data Week in 2018 with more regular exposure in email communication and @WolfsonLibrary.

As result of attending this conference, I’ll be blogging about open data on the Wolfson Information Skills blog and providing pointers to resources on our college LibGuide. I’ll also be working closely with colleagues across the college to timetable face-to-face training sessions.

Published 15 December 2017
Written by Dr Laurent Gatto, Angela Talbot, Dr Stephen Eglen, Kirsten Elliott and Laura Jeffrey
Creative Commons License

Plans for scholarly communication professional development

Well now there is a plan. The second meeting of the Scholarly Communication Professional Development Group was held on 9 October in the Jisc offices in London. This followed on from the first meeting in June about which there is a blog. The attendance list is again at the end of this blog.

The group has agreed we need to look at four main areas:

  • Addressing the need for inclusion of scholarly communication in academic library degree courses
  • Mapping scholarly communication competencies against training provision options
  • Creating a self assessment tool to help individuals decide if scholarly communication is for them
  • Costing out ‘on the job training’ as an option

What are the competencies in scholarly communication?

The group discussed the types of people in scholarly communication, noting that scholarly communication is not a traditional research support role either within research administration or in libraries. Working in scholarly communication requires the ability to present ideas and policies that are not always accepted or embraced by the research community.

The group agreed it would be helpful to identify what a successful scholarly communication person looks like – identifying the nature of the role, the types of skill sets and what the successful attributes are. The group has identified several examples of sets of competencies in the broad area of ‘scholarly communication’:

The group agreed it would be useful to review the NASIG Competencies and see if they map to the UK situation and to ask NASIG about how they are rolling it out across the US.

The end game that we are trying to get to is a suite of training products at various levels that as a community is going to make a difference to the roles we are recruiting.  We agreed it would be useful to explore how these frameworks relate to the various existing professional frameworks, such as CILIP, ARMA and Vitae. 

The approach is asking people: ‘Do you have a skills gap?’ rather than: ‘Do you (or your staff) need training?’. It would be helpful then, to develop a self assessment tool to allow people to judge their own competencies against the NASIG or COAR set (or an adaptation of these). The plan is to map the competencies against training provision options. 

Audiences

We have two audiences in terms of professional training in scholarly communication:

  1. New people coming into the profession – the initial training that occurs in library schools.
  2. Those people already in a research support environment who are taking on scholarly communication roles. 

The group also discussed scope. It would be helpful to consider how many people across the UK are affected by the need for support and training.

Another issue is qualifications over skills – there are people who are working in administrative roles who have expanded their skills but don’t necessarily have a qualification. Some libraries are looking at weighting past experience higher over qualifications. 

There needs to be a sense of equity if we were to introduce new requirements. While large research intensive institutions can afford professional development, in some places there is one person who has to do the scholarly communication functions as only part of their job – they are isolated and they don’t have funds for training. An option could be that if a training provision is to be ‘compliant’ with this group then it must allow some kind of free online training.

Initial training in library schools

As was discussed the previous time the group met, there is a problem in that library schools do not seem to be preparing graduates adequately for work in scholarly communication. Even the small number of graduates who have had some teaching in this area are not necessarily ready to hit the ground running and still need further development. The group agreed the sector needs to define how we skill library graduates for this detailed and complex area.

One idea that arose in the discussion was the suggestion we engage with library schools at their own conferences, perhaps asking to have a debate to ask them what they think they are doing to meet this need. 

The next conference of the library schools Association of Library and Information Science Educators is 6-9 February 2018 in Denver. Closer to home, iConference 2018 will be 25-28 March and will be jointly hosted by the UK’s University of Sheffield’s Information School and the iSchool at Northumbria. However, when we considered the conference options it became clear that this would not necessarily work, the focus of these conferences is academic focus, not practitioner or case studies. This might point to the source of some of the challenges we see in this space.

One of the questions was: what is really different now to the way it was 10-20 years ago? We need to survey people who are one or two years out from their qualifications.

Suggestions to address this issue included:

  • Identify which library schools are running a strand on academic librarianship and what their curriculum is
  • We work with those library schools which are trying to address this area, such as Sheffield, Strathclyde and UCL to try and identify examples of good practice of producing graduates who have the competencies we need
  • Integrate their students into ‘real life’, taking students in for a piece of work so they have experience

Professional Development option 1 – Institutional-based training

In the environment where there is little in the way of training options, ‘on the job’ training becomes the default. But is there a perception that on the job training comes without cost While the amount of training that happens in this environment is seen as cost neutral, it could be that sending someone on a paid for course could be more effective.

How much does it cost for us to get someone fully skilled using on the job training? There are time costs of both the new recruit and the loss of work time for the staff member doing the training. There is also the cost of the large amount of time spent recruiting staff because we cannot get people who are anywhere near up to speed. 

One action is to gain an understanding of how much it does actually cost to train a staff member up. 

Professional Development option 2 – Mentoring

There is an issue in scholarly communication with new people coming through continuously who need to be brought up to speed. One way of addressing this issue could be by linking people together. UKCORR are interested in creating some kind of mentoring system. ARMA also has a mentoring network which they are looking to relaunch shortly.

 The group discussed whether mentoring was something that can be brokered by an external group, creating an arrangement where if someone is new they can go and spend some time with someone else who is doing the same job. However, to do this we would need a better way of connecting with people. 

This idea ties into the work on institutional based training and the cost associated with it. We are aware there is a lot of cost in sharing and receiving info done by goodwill at present.

Professional Development option 3 – Community peer support events

Another way of getting people together is community and peer support, which is already part of this environment and could be very valuable. Between members of the group there are several events being held throughout the year. These range from free community events to paid for conferences. For example, Jisc is looking at running two to three community events each year. They recently trialled a webinar format to see if it is an opportunity to get online discussions going.

The group discussed whether we need more events, and what is the best way of supporting each other and what kind of remote methods could be used. There is a need to try and document this activity systematically.

Professional Development option 4 – Courses we can run now

The group agreed that while it might be too early for us to look at presenting courses, it would be useful to have an idea of who is offering what amongst the member organisations of the group and that we can start to glean a picture of what is covered. If we were to then map this to the competencies it helps decision making.

For example, UKSG have webinars on every month that are free which fulfils a need. Is there a topic we can put on for an hour?

 UKSG is planning a course towards the end of next year – a paid seminar face to face, outlining the publication process, particularly from the open access environment. This could be useful to publishers as well. It explains what needs to happen in a sequence of events – why it is important to track submission and acceptance dates. Pitching it to people who are new in the role and at senior managers who are responsible for staffing.

Professional Development option 5 – Private providers

Given the pull on resources for many in this sector we need to consider promoting and creating accessible training for all. So in that context the discussion moved to whether we were prepared to promote private training providers. This is a tricky area because there is such a range under the banner ‘private’ – from freelance trainers, to organisations who train as their primary activity to organisations who offer training as part of their wider suite of activities. Any training provision needs to look at sustainability, it isn’t always possible to rely on the goodwill of volunteers to deliver staff development and training.

For example, UKSG as an organisation is not profit-making — it is a charity and events are run on a non-profit basis. Jisc is looking at revenue on a non-profit basis to feed into Jisc’s support for the sector. ARMA work on a cost recovery basis – ARMA events are always restricted to members. Many of the member groups engage with private providers and pay them to come along and speak for the day.

We agreed that when we look at developing the competencies framework and identify how someone can achieve these skills we should be linking to all training provision, either through a paid course, online webinar or mentoring.  The group agreed we are not excluding private providers from the discussion. We are looking to get the best provision for the sector.

However, the topic came up about our own expertise. Experts working in the field already give talks at many events on work time, which is being paid for by their employer — who are in effect subsidising the cost of running the training or event. Can we use our own knowledge base to share this information amongst the community? Perhaps it is not about what you pay, it is what you provide into the community. 

Opening up the discussion

The group talked about tapping into existing conferences held by member organisations of the group to specifically look at this issue ‘branded’ under the umbrella of the group.  To ensure inclusion it would be good to have a webinar as part of the discussion at each of these conference so people who are not there can attend and contribute. Identified conferences were:

We also need to address other groups involved in the scholarly communication process within institutions, such as research managers, researcher developers and researchers themselves.

Next steps

  • Engaging with library schools to discuss the need for inclusion of scholarly communication in their academic library degree courses, possibly looking at examples of good practice
  • Discussion with NASIG about rolling out their scholarly communication competencies
  • Mapping scholarly communication competencies against current training provision options
  • Creating a self assessment tool to help individuals decide if scholarly communication is for them
  • Costing out ‘on the job training’ to evaluate the impact of this on the existing team

Attendees

  • Helen Blanchett – Jisc
  • Fiona Bradley – RLUK 
  • Sarah Bull – UKSG 
  • Helen Dobson – Manchester University 
  • Anna Grigson representing UKSG
  • Danny Kingsley – Cambridge University
  • Valerie McCutcheon – representing ARMA
  • Ann Rossiter – SCONUL
  • Claire Sewell – Cambridge University
  • Nick Shepherd – representing UKCoRR

 Published 27 November 2017
Written by Dr Danny Kingsley
Creative Commons License

It’s hard getting a date (of publication)

As part of Open Access Week 2017, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Maria Angelaki describes how challenging it can be to interpret publication dates featured on some publishers’ websites.

More than three weeks a year. That’s how much time we spend doing nothing but determining the publication date of the articles we process in the Open Access team.

To be clear about what we are talking about here: All we need to know for HEFCE compliance is when the final Version of Record was made available on the publisher’s website. Also, if there is a printed version of the journal, for our own metadata, we need to know the Issue publication date too.

Surely, it can’t be that hard.

Defining publication date

The Policy for open access in Research Excellence framework 2021 requires the deposit of author’s outputs within three months of acceptance. However, the first two years of this policy has allowed deposits as late as three months from the date of publication.

It sounds simple doesn’t it? But what does “date of publication” mean? According to HEFCE the Date of Publication of a journal article is “the earliest date that the final version-of-record is made available on the publisher’s website. This generally means that the ‘early online’ date, rather than the print publication date, should be taken as the date of publication.

When we create a record in Apollo, the University of Cambridge’s institutional repository, we input the acceptance date, the online publication date and the publication date.

We define the “online publication date” as the earliest online date the article has appeared on the publisher’s website and “publication date” as the date the article appeared in a print issue. These two dates are important since we rely on them to set the correct embargoes and assess compliance with open access requirements.

The problems can be identified as:

  • There are publishers that do not feature clearly the “online date” and the “paper issue date”. We will see examples further on.
  • To make things more complicated, some publishers do not always specify which version of the article was published on the “online date”. It can variously mean the author’s accepted manuscript (AAM), corrected proof, or the Version of Record (VoR), and there are sometimes questions in the latter as to whether these include full citation details.
  • Lastly, there are cases where the article is first published in a print issue and then published online. Often print publications are only identified as “Spring issue’ or the like.

How can we comply with HEFCE’s deposit timeframes if we do not have a full publication date cited in the publisher’s website? Ideally, it would only take a minute or so for anybody depositing articles in an institutional repository to find the “correct” publication date. But these confusing cases mean the minute inevitably becomes several minutes, and when you are uploading 5000 odd papers a year this turns into 17 whole days.

Setting rules for consistency

In the face of all of this ambiguity, we have had to devise a system of ‘rules’ to ensure we are consistent. For example:

  • If a publication year is given, but no month or day, we assume that it was 1st January.
  • If a publication year and month are given but no day, we assume that it was 1st of the month.
  • If we have an online date of say, 10th May 2017 and a print issue month of May 2017, we will use the most specific date (10th May 2017) rather than assuming 1st May 2017 (though it is earlier).
  • Unless the publisher specifies that the online version is the accepted manuscript, we regard it as the final VOR with or without citation details.
  • If we cannot find a date from any other source, we try to check when the pdf featured on the website was created.

This last example does start to give a clue to why we have to spend so much time on the date problem.

By way of illustration, we have listed below some examples by publisher of how this affects us. This is a deliberate attempt to name and shame, but if a publisher is missing from this list, it is not because they are clear and straightforward on this topic. We just ran out of space. To be fair though, we have also listed one publisher as an example to show how simple it is to have a clear and transparent article publication history.

Taylor & Francis – ‘published online’

Publication date of an article online

There are several ways you can read an article. If the article is open access or if you subscribe, then you can download a pdf of the article from the publisher website. Otherwise, you see the online version on the website. The two versions of a particular article are below, the pdf and the online HTML version.

Both the pdf and the online version of the article list the article history as:
Received 14 March 2016
Accepted as 23 December 2016
Published online 12 January 2017

and also cite the Volume, year of publication and issue.

But does the ‘Published online’ date refer to when the Version of Record was made available online or the first time the Accepted Manuscript was made available online? We can’t distinguish this to provide the date for HEFCE.

Publication date of the printed journal

While we know the volume, year of publication and issue number, we don’t know what the exact publication date of the printed journal is for our metadata records. If we drill a bit more and we visit past volumes of the journal, we can see that the previous complete year (2016) features 12 issues. So we can make an educated guess that the issue number refers to the publication month (in our example it is issue 5, so it is May 2017).

However, we are wrong. The 12 issues refer to the online publication issues and not the print issues. According to Taylor & Francis’ agents customer service page they “have a number of journals where the print publication schedule differs to the online”. They have a list of those journals available  and in our case we can see that this particular journal has 12 online issues but 4 paper issues in a year. So when did this actual article appear in print? Who knows.

Implications

Remember the 17 days a year? This is the type of activity that fills the time. Do we really need to do this time consuming exercise? Some might suggest that we contact the publisher and ask, but it is time-consuming and not always successful.

Elsevier’s Articles in Press

Elsevier’s description of Articles in Press states they are “articles that have been accepted for publication in Elsevier journals but have not yet been assigned to specific issues”. They could be any of an Accepted Manuscript, a Corrected Proof or an Uncorrected Proof. Elsevier have a page that answers questions about ‘grey areas’ and in a section discussing whether it is permissible for Elsevier to remove an article for some reason, they state they do not remove articles that have been published but “…papers made available in our “Articles in Press” (AiP) service do not have the same status as a formally published article…)”

This means the same article could be an ‘Article in Press’ in three different stages, none of which are ‘published’.  Even when an article has moved beyond “In Press” mode and has been published in an issue we are not informed which version Elsevier refers to when the “available online” date is featured.

Let’s look at an example. Is the ‘Available online’ date of 13  December 2016 when it was available online as an Accepted Manuscript, a Corrected Proof or an Uncorrected Proof? This is very unclear.

So we have a disconnect. The earliest online date is not the final published version as per HEFCE’s requirement. There is no way of determining the date when the final published date does actually appear online, so we need to wait until the article is allocated an issue and volume for us to determine the date. This could be some considerable time AFTER the work has been finalised. So open access is delayed, we risk non compliance and waste huge amounts of time.

Well done, Wiley

Wiley features all possible stages of the article’s various publication stages making it easy to distinguish the VoR online publication date, exactly what HEFCE (and we) require.

Article published in an issue

This is an example of when an article is published online and the print issue is published too.

Article published online (awaiting for a print issue date)

Wiley states the publication history clearly even when an article is published online but not yet included in a publication issue.

If you have a closer look at the screenshot, Wiley regards as “First published” the VoR online publication date (shown also on the left under Publication History) and not the Accepted Manuscript online date.

In this case, the publisher clearly states which version they refer to when the term “First Published” is used and also gives the reader the full history of the article’s “life stages” as well as inform us that the article is yet not included in an issue (circle on the right).

Conclusions

If you have made it this far through the blog post, you are probably working in this area and have some experience of this issue. If you are new to the topic, hopefully the above examples have illustrated how frustrating it is sometimes to find the correct information in order to comply with not only HEFCE’s timeframe requirements, but other open access compliance issues, especially when you set embargoes.

A simple task can become an expensive exercise because we are wasting valuable working hours. We are in the business of supporting the research community to openly share research outputs, not in the business of deciphering information in publishers’ websites.

We need clear information in order to effectively deposit an article to our institutional repository and meet whatever requirements need to be met. It is not unreasonable to expect consistency and standards in the display of publication history and dates of articles.

Published 27 October 2017
Written by Maria Angelaki
Creative Commons License

Flipping journals or filling pockets? Publisher manipulation of OA policies

As part of Open Access Week 2017, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Drs André Sartori  and Danny Kingsley look at examples of where publishers have structured pricing to take full advantage of funds available through UK open access policies.

We are spending a lot on open access in the UK. The 2017-2018 RCUK block grant allocations alone to support the RCUK Policy on Open Access add up to more than £8 million. So, what happens when a country makes a decision to introduce a significant  extra boost to the publication budget?

As was predicted early 2013, by the Chairman of the House of Commons Business, Innovation and Skills Committee: “Current UK open access policy risks incentivising publishers to introduce or increase embargo periods”. By September 2013, there was clear evidence this was happening.

Now, in the final year of the RCUK transition period, the situation is far, far worse.

No flipping going on here

Five years on from publication of the Finch report, whose recommendations helped to shape open access policies in the UK, it appears that relatively few journals have flipped from toll access to fully open access. For instance, a comprehensive dataset of embargo periods imposed by Elsevier journals indicates that only 42 of the publishing giant’s 2,300 active journals flipped from toll access to open access in the period 2013-2017. Precise figures for other publishers are not readily available, but compiled lists of converted journals are all very short, as described in this paper.

What several publishers have done instead is to adapt their policies to maximise the ability of their journals to capture the additional funds being injected into open access, by either imposing non-compliant embargo periods or charging more for mandated licences.

An embargo period is the time counted from the publication date of an article during which the author’s accepted version may not be distributed in open access repositories. There is a distinction between a press embargo and a publication embargo. The latter is what is being discussed here. We should also note that there continues to be no evidence to support publisher’s justification for imposing embargo periods.

Several funders (e.g. European Research Council, National Institute for Health Research, RCUK and Charities Open Access Fund partners including the Wellcome Trust) stipulate that open access to funded scientific research must be provided no later than 6 months after publication (with some funders allowing up to 12 months for humanities), either by self-archiving or by purchasing immediate open access.

Hence, any hybrid journal imposing an embargo period exceeding the maximum allowed by these funders will require authors of funded research to purchase immediate open access in order to comply with the funder’s policy. And, sure enough, this was exactly the response of several publishing Goliaths to the introduction of funders’ open access policies.

Increased embargo periods = revenue

For instance, from 2004 to 2011 the largest of them all allowed posting of accepted manuscripts on personal websites or institutional repositories without an embargo. In 2011, Elsevier required that papers affected by a funder or institutional mandate were only able to be deposited if there was a specific agreement with Elsevier. In 2013, shortly after RCUK announced its open access policy, Elsevier published the first version of its embargo list, which listed only six journals (you read it right – six of 2732 journals) with an embargo period within the 6 month maximum allowed by RCUK for policy compliance via self-archiving. The number of journals compliant with RCUK’s self-archiving option for compliance has increased to 10 since then.

Springer, the world’s second largest journal publisher, also allowed authors to deposit their work in institutional repositories with no embargo until 2013, when it introduced an embargo period of 12 months for all their journals, effectively blocking the green route for compliance with major funders’ policies for all articles in STEM (Science, Technology, Engineering and Mathematics) subjects.

The other three of the big five publishers—Wiley-Blackwell, Taylor & Francis and Sage—also impose embargo periods that are mostly incompatible with compliance via self-archiving for the funders listed above. Wiley has adopted, since April 2013, embargo periods of 12 months for STEM and 24 months for HASS (Humanities, Arts, and Social Sciences) journals. Only 44 of Taylor & Francis’ 2,577 journals are hybrids that support self-archiving without embargo. Finally, Sage mirrors Springer’s policy of a 12-month embargo period for all their journals.

Introducing or increasing embargo periods is a very effective method of encouraging funded authors to select a paid-for open access option, but it lacks the creativity of some of the strategies considered below.

Higher charges for a CC BY licence

Funders aspire to maximum reuse of published results of the research they have invested in, so many require a Creative Commons Attribution (CC BY) licence when they are paying for open access. Examples are Bill & Melinda Gates Foundation, RCUK and COAF partners. Charging a premium for this licence type is therefore yet another method used by publishers to take advantage of funding for open access.

Below are a few examples of publishers charging extra for a CC BY licence.

As pointed out here, publishers rarely feel the need to explain the reasons for the differential pricing. Of the examples above, only AAAS justifies the surcharge by stating that “We assess the surcharge to account for potential lost secondary revenue such as permissions and reprint sales”.

Let’s for a moment ignore the fact that their base APC ($3,000) is well above the average charged by open access journals, and consider the potential revenue from the sale of reprints. Given the alternative licence offered by Science Advances (CC BY-NC) allows anyone to copy and redistribute the material in any medium or format (but not to sell it), what revenue could be reasonably be expected from reprint sales?

Targeted embargo periods

A third and more complex strategy to capitalise on research funders’ policies, and which fortunately appears to be losing ground, is to have policies specifying more strict self-archiving conditions for authors of funded research, or longer embargo periods for deposits in PubMed Central and Europe PMC, the subject repository mandated by several major funders of biomedical research (e.g. BBSRC, MRC, NIHR and COAF partners).

BMJ journals, for example, set a special embargo of 12 months on deposits in PMC, while allowing deposits in other open access repositories without any embargo.

Elsevier, Wiley and more recently Emerald are all examples of publishers that have at some point dictated different conditions for authors following open access mandates, but as of the date of this post do not discriminate authors on the basis of their funding.

Call us cynics

This last technique to squeeze every penny out of government funds is possibly the most cynical and puts even more lie to the claims publishers make about the necessity for embargo periods. Either making an author’s accepted manuscript available in a repository causes the cancellation of journal subscriptions or it doesn’t. The funding behind the research described in the paper is irrelevant.

And yet we continue to comply and we continue to pay. The RCUK is morphing into UK Research and Innovation on 1 April 2018. This is the time to take serious stock of the policies that have lined the pockets of big academic publishing companies and change them to achieve the actual end goal which is the dissemination of research. Green over gold people.

Published 26 October 2017
Written by Dr Andre Sartori and Dr Danny Kingsley
Creative Commons License

Choosing from a cornucopia: a thesis digitisation project

As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Drs Danny Kingsley and Matthias Ammon describe the process of choosing theses to digitise.

For decades microfilm was the way documents were photographed and stored. The British Library holds a collection of 14,000 Cambridge PhD theses on microfilm. These date back to the 1960s and go through to 2008 when digitisation took over from microfilm. In 2016 the Office of Scholarly Communication (OSC) was contacted by the British Library with an offer of low cost digitisation of these theses.

Clearly being able to upload these theses to University’s repository, Apollo would make the works more visible. It would also be a major improvement on having to request the works be digitised from paper, because the cost was significantly lower. Even though we did not have permission from the authors to make the work openly available, they would be requestable. The OSC decided to invest £20,000, which would pay for 10% of the theses, a total of 1,400.

There were two primary criteria that we were considering in choosing which theses to upload – the quality of the finished product and the likelihood of the theses being requested.

Quality of digitisation

Before word processing, theses were typewritten. The typeprint in the originals is not always clear and even. In addition, images were glued into the works and those images themselves were not always originals, so the quality of a copy is poor.

We needed to look at whether these types of issues affected the quality and readability of a thesis digitised from microfilm. In addition, one advantage of having a digitised thesis is the ability to run Optical Character Recognition (OCR) over it so the work becomes searchable. However OCR does not work on handwriting or if the type is uneven.

To test this, the OSC decided to ask the British Library to digitise a few samples from older theses and from theses that contained unusual characters or maps to ascertain the quality of the digitisation. Louise Clarke, the Superintendent in the Manuscripts Reading Room considered the British Library list and found some examples of theses that that had not been digitised at our end. She identified some sample pages to be scanned that might prove to be challenging.

When the scans arrived, Sarah Middle, our repository manager assessed the visual quality and tried to run OCR over the scans to test for accuracy.

  • The scan of 1997 thesis that contained photos, diagrams and equations, had fuzzy text at the edges but was generally legible and the OCR samples were accurate. However the photos looked very dark.
  • The 1968 thesis that contained typed Greek characters had a poorer scan quality. It was ‘shadowy’ and some of the letters in the English text blur together. This meant OCR was almost pointless in some places as the accuracy is so low. In addition OCR did not pick up all the Greek characters, although we were not sure this would be better if the scan was done in-house from the original.
  • A 1977 thesis that included handwritten Hebrew characters had much better visual quality than the 1968 thesis but OCR didn’t pick up the handwritten Hebrew at all and while the accuracy of OCR on the English text was much higher than the 1968 thesis it was not as good as the 1997 thesis.
  • The final thesis was a 1989 thesis that contained images of handwriting and equations in the text. This handwriting had been rendered as an image so OCR was not applied. Given that in this particular work the handwriting was there for illustrative purposes so this was not in itself an issue. Something that was odd was the OCR on the text seemed to include a lot of Greek characters, even though there were none in the sample. We hypothesised that possibly because some of the equations contained Greek characters this might have confused the language settings. The mathematical formulae rendered about as well as expected from OCR.

We then went back to Louise Clarke and asked her to scan the pages from the original as a comparison. Even allowing for the fact that a professional scan by the Digital Content Unit would have been of higher quality, it did help the assessment. We found that the photo was lighter (and therefore clearer) in the digital scan from the original, but the text from the scanned microfilm was much clearer than a 200dpi scan from the original.

This process led to the conclusion that we would have the best results if we focused on more recent theses.

Which subjects?

We decided to take advantage of some information already in house on which theses were likely to be more read. Until recently, Cambridge PhD students only had to provide a hardbound copy of their thesis for graduation. While in the past few years, some PhD students have uploaded their theses to the repository to make them open access, the majority are not available in this format.

If a researcher wanted to look at a Cambridge PhD they either had to come to the University Library and read the work in the Manuscripts Reading Room, or order a digital copy. The Digital Content Unit in the Library manages these requests for digitisation. Indeed last year during Open Access Week we blogged about the project to upload the collection of scanned theses into the repository and the attempt to find the authors for permission to make them open access.

What this gave us, however, was an indication of the theses that people wanted to read. We were particularly interested to know if there was a pattern in terms of the subjects that were being requested for digitisation.

Our repository manager went through the list of all theses that had been requested and found 452 distinct classmarks in the correct format, which seemed like a good sample size. Our initial plan was to see if the classmarks (which are codes used to identify the subject of a book or manuscript) provided for each thesis could be compared to the catalogue to retrieve department information/subject headings, which we could in turn use as a basis to select the theses.

Unfortunately our technical team was tied up at the time with the implementation of a new library management system so we had to revert to a manual process.  This  meant looking the thesis up manually in the catalogue and noting the department. In the end Louise Clarke checked 200 of the theses requested between July 2015 and July 2016  to establish which departments the theses belonged to.

Based on these statistics History was a clear outlier as by far the most requested subject. Also popular, but to a less statistically significant level, were subjects such as Engineering, Social Anthropology, Chemistry and Divinity. It should be noted that Engineering produces by far the largest number of theses overall, so the inclusion of Engineering theses in this list would be expected.

So far, so good. We knew the subjects that we should focus on, and that we should aim to digitise more recent theses that had been created with word processing. Now for the grunt work of choosing the 10%.

Choosing the 10%

Obviously the first thing we needed to do was exclude all of the theses we held open access in the repository and any that we had digitised ourselves from the original from the list of 14,000 microfilmed theses.

The British Library holdings contained Dewey numbers. While Dewey numbers are only an approximation of departmental divisions within the University, it was still a mechanism to identify the subjects. Our repository manager Sarah Middle collated the Dewey numbers for the British Library holdings and the project manager Matthias Ammon performed a rough sorting of theses according to the main Dewey number headings.

We decided to include all of the History theses going back to 1980. These corresponded to Dewey classes 90x, 92x, 93x, 94x, 95x, 96x, 97x, 98x and 99x) going back to 1980. There were a total of 756 theses, just over half of the total list.

In the end, the rest up to 1400 was filled up with subjects that appeared to be popular based on the sample analysis and roughly adjusted for the total number of theses produced in each subject in the University, with more recent cut-off points for the science subjects. While Classics was a popularly requested subject, the number of items available in the British Library’s microfilm holdings in the corresponding Dewey classes (84x and 88x) was small.

The decision was taken to include:

  • Chemistry back to 1990 (a total of 181)
  • Engineering back to 1995 (a total of 140)
  • Sociology and Anthropology, covering several departments of the University (a total of 216)
  • Philosophy (63)
  • Religion (30)
  • Classics (14)

While there was a certain element of arbitrariness in the process, this was considered a starting point. We are hopeful the remainder of the theses held on microfilm by the British Library will be digitised in due course.

Making the theses available

The British Library subsequently scanned our selection and provided us with the files on an external drive earlier this year. We were able to extract the metadata from EThOS to allow for a bulk upload of the works. However, this project has made us assess the way we were managing access to theses in the Library. This policy thinking has now been completed and we are developing an online request system for these restricted theses. The whole set of 1,400 theses should be available in the repository for request during November.

The Office of Scholarly Communication is grateful for the support of the Arcadia fund, a charitable foundation of Lisbet Rausing and Peter Baldwin for this project.

Published 25 October 2017
Written by Dr Danny Kingsley and Dr Matthias Ammon
Creative Commons License

How open is Cambridge? 2017 edition

Welcome to Open Access Week 2017. The Office of Scholarly Communication at Cambridge is celebrating with a series of blog posts, announcements and events. In today’s blog post we revisit the question about the openness of Cambridge. 

For Open Access week last year I looked at how open Cambridge was using the extremely useful Lantern tool, developed by Cottage Labs, and which is the basis of the Wellcome Trust’s compliance tool. If you haven’t used it before, Lantern takes a list of DOIs, PMIDs, or PMCIDs and runs these through a variety of sources to try and determine the Open Access status of the publication. I found that, for publications in 2015, 51.8% of all of Cambridge’s research publications were available in at least one ‘Open Access’ source. How did Cambridge’s 2016 publications fair? Read on to find out.

Using the same method as last year, I first obtained a list of DOIs from Web of Science (n=9416) and Scopus (n=9124) for articles, proceedings papers and reviews published in 2016. Combining and deduplicating these lists returned 10,674 unique DOIs (~29 publications/day). I also refreshed the 2015 publication data using the latest Web of Science and Scopus information, which returned 10,090 unique DOIs. Year-on-year, this represents a 5.8% increase in the total number of publications attributable to Cambridge – more than inflation!

The deduplicated DOI lists for 2015 and 2016 (20,764 DOIs in total) were fed into Lantern and analysed in combination with information from Web of Science and the University’s institutional repository Apollo.

Figure 1. Distribution of papers, published in 2015 and 2016 which have a DOI, according to the Open Access sources they can be found in. 57.5% of 2016’s articles appear in at least one Open Access source, which represents a 4% increase over 2015. One third of all papers published in 2016 are available in Apollo.

Very pleasingly the percentage of publications available in at least one Open Access source increased to 57.5% in 2016 compared to only 53.4% for 2015 publications. Given that the total number of publications also increased during this period this result is doubly exciting. In raw numbers, this means that while 5384 publications were Open Access in 2015, an impressive 6135 publications were made Open Access in 2016.

Most of this increase can be attributed to the much larger share of publications that appear in Apollo, which is now the largest source of Open Access material for the University of Cambridge. An additional 822 publications were deposited in Apollo in 2016 compared to 2015, which is a 30% increase in one year alone.

You can now find more of the University’s research outputs in Apollo than in any other Open Access source. And because we operate an extremely popular Request a Copy service, potentially all of the publications held in Apollo, even those that are restricted and under embargo, are available to anyone in the world. You just need to ask.

Published 23 October 2017
Written by Dr Arthur Smith
Creative Commons License

Benchmarking RDM Training

This blog reports on the progress of the international project to benchmark Research Data Management training across institutions. It is a collaboration of Cambridge Research Data Facility staff with international colleagues – a full list is at the bottom of the post. This is a reblog, the original appeared on 6 October 2017. 

How effective is your RDM training?

When developing new training programmes, one often asks oneself a question about the quality of training. Is it good? How good is it? Trainers often develop feedback questionnaires and ask participants to evaluate their training. However, feedback gathered from participants attending courses does not answer the question how good was this training compared with other training on similar topics available elsewhere. As a result, improvement and innovation becomes difficult. So how to objectively assess the quality of training?

In this blog post we describe how, by working collaboratively, we created tools for objective assessment of RDM training quality.

Crowdsourcing

In order to objectively assess something, objective measures need to exist. Being unaware of any objective measures for benchmarking of a training programme, we asked Jisc’s Research Data Management mailing list for help. It turned out that a lot of resources with useful advice and guidance on creation of informative feedback forms was readily available, and we gathered all information received in a single document. However, none of the answers received provided us with the information we were looking for. To the contrary, several people said they would be interested in such metrics. This meant that objective metrics to address the quality of RDM training either did not exist, or the community was not aware of them. Therefore, we decided to create RDM training evaluation metrics.

Cross-institutional and cross-national collaboration

For metrics to be objective, and to allow benchmarking and comparisons of various RDM courses, they need to be developed collaboratively by a community who would be willing to use them. Therefore, the next question we asked Jisc’s Research Data Management mailing list was whether people would be willing to work together to develop and agree on a joint set of RDM training assessment metrics and a system, which would allow cross-comparisons and training improvements. Thankfully, the RDM community tends to be very collaborative, which was the case also this time – more than 40 people were willing to take part in this exercise and a dedicated mailing list was created to facilitate collaborative working.

Agreeing on the objectives

To ensure effective working, we first needed to agree on common goals and objectives. We agreed that the purpose of creating the minimal set of questions for benchmarking is to identify what works best for RDM training. We worked with the idea that this was for ‘basic’ face-to-face RDM training for researchers or support staff but it can be extended to other types and formats of training session. We reasoned that same set of questions used in feedback forms across institutions, combined with sharing of training materials and contextual information about sessions, should facilitate exchange of good practice and ideas. As an end result, this should allow constant improvement and innovation in RDM training. We therefore had joint objectives, but how to achieve this in practice?

Methodology

Deciding on common questions to be asked in RDM training feedback forms

In order to establish joint metrics, we first had to decide on a joint set of questions that we would all agree to use in our participant feedback forms. To do this we organised a joint catch up call during which we discussed the various questions we were asking in our feedback forms and why we thought these were important and should be mandatory in the agreed metrics. There was lots of good ideas and valuable suggestions. However, by the end of the call and after eliminating all the non-mandatory questions, we ended up with a list of thirteen questions, which we thought were all important. These however were too many to be asked of participants to fill in, especially as many institutions would need to add their own institution-specific feedback questions.

In order to bring down the number of questions which should be made mandatory in feedback forms, a short survey was created and sent to all collaborators, asking respondents to judge how important each question was (scale 1-5, 1 being ‘not important at all that this question is mandatory’ and 5 being ‘this should definitely be mandatory’.). Twenty people participated in the survey. The total score received from all respondents for each question were calculated. Subsequently, top six questions with the highest scores were selected to be made mandatory.

Ways of sharing responses and training materials

We next had to decide on the way in which we would share feedback responses from our courses and training materials themselves . We unanimously decided that Open Science Framework (OSF) supports the goals of openness, transparency and sharing, allows collaborative working and therefore is a good place to go. We therefore created a dedicated space for the project on the OSF, with separate components with the joint resources developed, a component for sharing training materials and a component for sharing anonymised feedback responses.

Next steps

With the benchmarking questions agreed and with the space created for sharing anonymised feedback and training materials, we were ready to start collecting first feedback for the collective training assessment. We also thought that this was also a good opportunity to re-iterate our short-, mid- and long-term goals.

Short-term goals

Our short-term goal is to revise our existing training materials to incorporate the agreed feedback questions into RDM training courses starting in the Autumn 2017. This would allow us to obtain the first comparative metrics at the beginning of 2018 and would allow us to evaluate if our designed methodology and tools are working and if they are fit for purpose. This would also allow us to iterate over our materials and methods as needed.

Mid-term goals

Our mid-term goal is to see if the metrics, combined with shared training materials, could allow us to identify parts of RDM training that work best and to collectively improve the quality of our training as a whole. This should be possible in mid/late-2018, allowing time to adapt training materials as result of comparative feedback gathered at the beginning of 2018 and assessing whether training adaptation resulted in better participant feedback.

Long-term goals

Our long-term goal is to collaboratively investigate and develop metrics which could allow us to measure and monitor long-term effects of our training. Feedback forms and satisfaction surveys immediately after training are useful and help to assess the overall quality of sessions delivered. However, the ultimate goal of any RDM training should be the improvement of researchers’ day to day RDM practice. Is our training really having any effects on this? In order to assess this, different kinds of metrics are needed, which would need to be coupled with long-term follow up with participants. We decided that any ideas developed on how to best address this will be also gathered in the OSF and we have created a dedicated space for the work in progress.

Reflections

When reflecting on the work we did together, we all agreed that we were quite efficient. We started in June 2017, and it took us two joint catch up calls and a couple of email exchanges to develop and agree on joint metrics for assessment of RDM training. Time will show whether the resources we create will help us meet our goals, but we all thought that during the process we have already learned a lot from each other by sharing good practice and experience. Collaboration turned out to be an excellent solution for us. Likewise, our discussions are open to everyone to join, so if you are reading this blog post and would like to collaborate with us (or to follow our conversations), simply sign up to the mailing list.

Resources

Published 9 October 2017
Written by: (in alphabetical order by surname): Cadwallader Lauren, Higman Rosie, Lawler Heather, Neish Peter, Peters Wayne, Schwamm Hardy, Teperek Marta, Verbakel Ellen, Williamson, Laurian, Busse-Wicher Marta
Creative Commons License

Milestone -1000 datasets in Cambridge’s repository

Last week, Cambridge celebrated a huge milestone – the deposit of the 1000th dataset to our repository Apollo since the launch of the Research Data Facility in early 2015. This is the culmination of a huge amount of work by the team in the Office of Scholarly Communication, in terms of developing systems, workflows, policies and through an extensive advocacy campaign. The Research Data team have run 118 events over the past couple of years and published 39 blogs.

In the past 12 months alone there have been 26000 downloads of the data in Apollo. In some cases the dataset has been downloaded many times – 170 – and the data has featured in news, blogs and Twitter.

An event was held at Cambridge University Library last week to celebrate this milestone.

   

Opening remarks

The Director of Library Services, Dr Jess Gardner opened proceedings with a speech where she noted “the Research Data Services and all who sail in her are at the core of our mission in our research library”.

Dr Gardner referred to the library’s long and proud history of collecting and managing research data that “began on vellum, paper, stone and bone”. The research data of luminaries such as Isaac Newton and Charles Darwin was on paper and, she noted “we have preserved that with great care and share it openly on line through our digital library.”

Turning to the future, Dr Gardner observed: “But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical is a core part of what we now do.”

“In the 21st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible,” she noted.  “It is about sharing freely wherever it is appropriate to do so”. [Dr Gardner’s speech is in full at the end of this post.]

Perspectives from a researcher

The second speaker was Zoe Adams, a PhD student at Cambridge who talked about the work she has done with Professor Simon Deakin on the Labour Regulation Index in association with the Centre for Business Research.

Ms Adams noted it was only in retrospect she could “appreciate the benefit of working in a collaborative project and open research generally”. She discussed how helpful it had been as an early career researcher to be “associated with something that was freely available”. She observed that few of her peers had many citations, and the reason she did was because “the dataset is online, people use the data, they cite the data, and cite me”.

Working openly has also improved the way she works, she explained, saying “It has given me a new perspective on what research should be about. …  It gives me a sense that people are relying on this data to be accurate and that does change the way you approach it.”

View from the team

The final speaker was Dr Lauren Cadwallader, Joint Deputy Head of the OSC with responsibility for the Research Data Facility, who discussed the “showcase dataset of the data that we can produce in the OSC” which is  taken from usage of our Request a Copy service.

Dr Cadwallader noted there has been an increase in the requests for theses over time. “This is a really exciting observation because the Board of Graduate studies have agreed that all students should deposit a digital copy of their thesis in our repository,” she said. “So it is really nice evidence that we can show our PhD students that by putting a copy in the repository people can read it and people do want to read theses in our repository.”

One observation was that several of the theses that were requested were written 60 years ago, so the repository is sharing older research as well. The topics of these theses covered algebra, Yorkshire evangelists and one of the oldest requested theses was written in 1927 about the Falkland Islands. “So there is a longevity in research and we have a duty to provide access to that research, ” she said.

Thanks go to…

The dataset itself is one created by the OSC team looking at the usage of our Request a Copy service. The analysis undertaken by Peter Sutton Long and we recently published a blog post about the findings.

The music played at the event was complied by Tony Malone and covers almost 1000 years of music, from Laura Cannell’s reworking of Hildegard of Bingen, to Jane Weaver’s Modern Cosmology. There are acknowledgments to Apollo, and Cambridge too. The soundtrack is available for those interested in listening.

This achievement is entirely due to the incredible work of the team in the Research Data Facility and their ability to engage with colleagues across the institution, the nation and the world. In particular the vision and dedication of Dr Marta Teperek cannot be understated.

In the words of Dr Gardner: “They have made our mission different, they have made our mission better, through the work they have achieved and the commitment they have.”

The event was supported by the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

 

 

Published 21 September 2017
Written by Dr Danny Kingsley
Creative Commons License

Speech by Dr Jess Gardner

First let us begin with some headline numbers. One thousand datasets. This is hugely significant and a very high level when looking at research repositories around the country. There is every reason to be proud of that achievement and what it means for open research.

There have been 26000 downloads of that data in the past 12 months alone – that is about use and reuse of our research data and is changing the face of how we do research. Some of these datasets have been downloaded 117 times and used in news, blogs and Twitter. The Research Data team have written 39 blogs about research data and have run 118 events, most of these have been with researchers.

While the headline numbers give us a sense of volume, perhaps let’s talk about the underlying rationale and philosophy behind this, which is core.

Cambridge University Library has a 600 year old history we are very proud of. In that time we have had an abiding responsibility to collect, care for and make available for use and reuse, information and research objects that form part of the intrinsic international scholarly record of which Cambridge has been such a strong part. And the ability for those ideas to inspire new ideas. The collection began on vellum, paper, and stone and bone.

And today much of that of course is digital. You can’t see that in the same way you can see the manuscripts and collections. It is sometimes hard to grasp when we are in this grand old dame of a building that I dare you not to love. It is home to the physical papers of such greats as Isaac Newton and Charles Darwin. Their research data was on paper and we have preserved that with great care and share it openly on line through our digital library. But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical is a core part of what we now do.

And the people in this room have changed that. They have made our mission different, they have made our mission better through the work they have achieved and the commitment they have.

Philosophically this is very natural extension of what we have done in the Library and the open library and its great research community for which this very building is designed. Some of you may know there is a philosophy behind this building and the famous ‘open library Cambridge’. In the 19th century and 20th century that was mostly about our open stack of books and we have quite a few of them, we are a little weighed down by them.

Our research data weighs less but it is just as significant and in the 21st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible. It is about sharing freely wherever it is appropriate to do so and there are many reasons why data isn’t open sometimes, and that is fine. What we are looking for is managing so we can make those choices appropriately, just as we have with the archive for many, many years.

So whilst as there is a fantastic achievement to mark tonight with those 1000 datasets it really is significant, we are really celebrating a deeper milestone with our research partners, our data champions, our colleagues in the research office and in the libraries across Cambridge, and that is about the changing role in research support and library research support in the digital age, and I think that is something we should be very proud of in terms of what we have achieved at Cambridge. I certainly am.

I am relatively new here at Cambridge. One of the things that was said to me when I was first appointed to the job was how lucky I was to be working at this University but also with the Office of Scholarly Communication in particular and that has proved to be absolutely true. I like to take this opportunity to note that achievement of 1000 datasets and to state very publicly that the Research Data Services and all who sail in her are at the core of our mission in our research library. But also to thank you and the teams involved for your superb achievements. It really is something to be very proud of and I thank you.

 

Biting the hand that feeds – the obfuscation of publishers

Let’s not pull any punches here. We are unimpressed. Late last week HEFCE published a blog: Are UK universities on track to meet open access requirements? In the blog HEFCE identified the key issues in meeting OA requirements as:

  • The complexity of the OA environment
  • Resource constraints
  • Cultural resistance to OA
  • Inadequate technical infrastructure.

Right. So the deliberate obstruction to Open Access by the academic publishing industry doesn’t factor at all?

Policy confusion

We also note that the fact that the funders have different compliance requirements in terms of the means by which we make work available, the timing in the publication process and the financial support of their policies is not articulated clearly in this list. The euphemism used is ‘complexity’.

Well, yes. To give some idea of how ‘complex’ this situation is, the sister blog to this one describes the decision making process the Cambridge Open Access Team follows to ensure compliance with our multiple policies.

But we are hopeful the impending creation of UK Research and Innovation bringing HEFCE into the same regulatory body as the Research Councils will result in something being done about the conflicting policy problem. Indeed, the survey HEFCE is running may feed into that process.

Publisher obfuscation

However there are no such positive outlooks for the challenges publishers continually throw at us in relation to Open Access.

Elsevier has a long and complicated list of embargoes. There is a different list for  embargoes imposed in the UK to those for the rest of the world.  The complications of a range of embargo periods and some journals with non-standard arrangements are apparent on both Wiley’s  and Taylor & Francis’ pages. BMJ has a non-compliant special embargo of 12 months for funders that require archiving of articles. There is no embargo at all for non-funded papers.

An exemplar is Springer with a standard embargo of 12 months for everything.  However, because we are signed up to the Springer Compact most of our publications are published Open Access anyway.

We are not alone in our irritation. In the last couple of months there have been two publications identifying the amount of work libraries do to manage embargoes for Open Access compliance.

The University of St Andrews published a UKCORR blog on 22 August. Requesting permission: reflections and perspectives from the University of St Andrews discussed the processes they have to manage to ensure compliance with publishers which don’t have a public Open Access or author self-archiving policy. The reason this is a challenge is  because 60% of their permissions requests are for outputs potentially in scope for the REF open access policy. St Andrews notes that “having an effective permissions policy can potentially affect an institution’s approach to their REF return and level of exceptions required.”

Management of poor publisher practices in relation to Open Access is not a UK specific problem. In July, Leila Sterman, scholarly communication librarian at Montana State University published an article in College and Research Libraries News – The enemy of the good: How specifics in publisher’s green OA policies are bogging down IR deposits. In the article she argued that there is no consistency in policies and embargoes, which creates unnecessary work. She states that publishers, “who often claim they are supportive of green open access, work to impose restrictions on digital works as if they were physical items being placed in physical locations.”

Sterman also refers to the same challenges identified by St Andrews, noting that: Green open access policies are often buried on publisher’s websites or only mentioned in contracts. This practice obfuscates important information, increasing both the time library staff spend searching for that information and author’s obliviousness to the opportunities and restrictions of green open access.

Indeed this is not a new issue. Over four years ago in a previous role and different country, I published a post: Walking in quicksand – keeping up with copyright agreements which notes similar issues as these two recent papers, but also identifies the issue of publishers changing their policies without notice.

Do we need embargoes?

Publishers argue that they need embargoes  to remain ‘sustainable’. The claim is that by making an author’s copy of the work (not copyedited or formatted) available in a repository on a relatively piecemeal basis will cause libraries to cancel subscriptions en masse. Despite repeated attempts, to date there has been no evidence released to support this claim.

The UK produces 6% of the world’s research output. And yet when the RCUK policy was announced some publishers (see here and here) changed their policies across the globe to take advantage of the huge amounts of UK government funds being added into the system.

As an aside, the green = cancellation argument does beg the question about the value publishers themselves place on the work they do between an Author’s  Accepted Manuscript and the final Version of Record. If access to the AAM is apparently good enough for libraries to cancel subscriptions then why bother doing the extra work?

Getting some perspective

In 2015 Universities UK published a paper Monitoring the transition to open access 
This report contained a table identifying where research was available to download.
Institutional repositories are the red section. The really small red section. Globally, institutional repositories hold 4.8% of all of the AAMs available. In the UK, probably due to the strongest Open Access mandates in the world, the percentage of AAMs available in institutional repositories proportionally is slightly higher at 7.9%.
These are tiny numbers. The research material research institutions are making available in their repositories are not the big threat to publishers’ ‘sustainability’.
In contrast, the incredible coverage of SciHub – which provides (illegal) access to two thirds of the world’s research – as the final published version – poses a real actual threat.

Who loses out here?

Of all the different sharing platforms, academic libraries are the only ones curating deposits and navigating the embargo labyrinth.  Author deposits to commercial sharing sites and PubMed Central primarily rely on authors’ instructions relating to embargoes.
The subscriptions paid by academic libraries worldwide hold up the publishing industry. Talk about biting the hand that feeds you.
Published 18 September 2017
Written by Dr Danny Kingsley
Creative Commons License