All posts by Office of Scholarly Communication

Orpheus, an Open Source solution for journal policies

As anyone who administers an institutional repository can tell you, repeatedly looking up journals’ policies and attributes is a pain in the neck. We have discussed this problem a few times, noting in 2017 the complex embargo situation and the confusion about publication dates. Indeed it has been clear since 2013 that this is so complicated it is unrealistic to expect researchers to navigate this situation. This means considerable amounts of repository staff time are typically spent traversing a confusing landscape of complex, inconsistent and fluid policies.

To stop or at least mitigate this pain, wouldn’t it be great if those policies and attributes were available in a structured, machine-readable format, so that the burden of retrieving and using such information could be transferred from people to repository software?

(Given, an even better solution would be, of course, for publishers to have simpler and standardised policies across their journals, but there is little indication that this will happen any time soon – see links above.)

Our solution – Orpheus

JISC are currently working on and will shortly release version 2 of their SHERPA services, which have enormous potential for providing machine-readable data on embargo periods and at least some of the other attributes we need. However, circa two years ago we decided that, in face of increasing demand for our services, we could not afford a wait to automate our workflows. Besides, we reasoned that any external solution would be unlikely to cover all the journal attributes we rely on beyond embargo periods, and to be updated at the frequency we require.

So, in the last trimester of 2017, I set out to develop a database that could store in a strictly structured and machine-readable format all bits of information from journals, publishers and conferences that we repeatedly look up. This would replace the time the team behind Cambridge’s DSpace repository Apollo was spending retrieving and manually applying those data to each deposited item.

Orpheus (named after the son of Apollo in Greek mythology) was thus born in January 2018. To mark its first birthday, we have just turned Orpheus into an Open Source project and released the code at https://github.com/osc-cam/orpheus.

In this blog post, I will provide an overview of Orpheus’ main features and of how we have been using it to increase the efficiency of our repository and services.

Supported attributes and available interfaces

On the web interface for editors and users, attributes are listed, for each journal, publisher and conference, in a detailed view that looks like this:

Orpheus currently supports the following attributes of journals/publishers:

  • name, synonyms, URL and, for journals, ISSNs and publisher
  • revenue model (subscription, hybrid, fully Open Access)
  • gold OA policy (article processing charges, licence choices, etc)
  • green OA policy (allowed versions and outlets, embargo period, licence, etc)
  • Europe PMC participation (whether or not the journal deposits papers in EPMC)
  • deals/discounts (whether the journal is included in an institutional deal such as Springer Compact or offers any discounts)
  • contacts (e-mail addresses for queries

Orpheus’ RESTful API exposes journal attributes in JSON format and its response can be tailored to facilitate integration with repositories platforms and other systems. For instance, the screenshot below shows only the attributes that we feed into Apollo and/or our helpdesk system (on the left below).

  

Like every project written in Django, Orpheus includes an additional web interface for administrators to manage users and permissions, and to perform bulk operations such as updating or deleting multiple entries at once. It looks pretty too (as seen on the right above).

Current coverage

Orpheus includes parsers that allowed substantial datasets of journals and their attributes to be imported into the system, saving the Cambridge team the effort of populating the database from scratch. Data was imported from:

Orpheus currently has almost 40,000 journal entries belonging to more than 8,000 publishers (“preferred names”; the larger number of “total entries” includes synonyms).

While we may derive some satisfaction in achieving comprehensive coverage and including journals such as هیدروژیولوژی and Демографија, what really matters to us in terms of maximising the efficiency of our services is databasing those journals and conferences that Cambridge academics most often publish in.

A quick analysis of journal names contained in all Apollo submissions received since 2014 (29,598 submissions) reveals that we are now able to match 83% of those to a record in Orpheus and retrieve embargo periods, APC value and licencing information for, respectively, 72, 48 and 37% of past submissions. These results are encouraging, especially considering that (1) ’journal name’ in this dataset of past submissions includes conference names and strings that do not correspond to true journals, such as 13 entries for ’TBC’ (to be completed); and (2) for new submissions, our system tries to find matches by ISSN and eISSN before attempting matches by name, so we have a better chance of matching “Hepatology (Baltimore, Md.)” to the right journal than this analysis would suggest.

Integrations with Apollo and Zendesk

Without digging into the technical details of the integration of Orpheus with Apollo (to be honest, I would not be able to go into detail here, for the integration with Apollo was fully implemented by my colleague Agustina Martinez-Garcia), it suffices to say that Apollo has been querying Orpheus and successfully applying embargoes to many of the c. 900 submissions we receive per month (we received, on average, 892 monthly submissions in 2018).

Orpheus has also been integrated with our helpdesk system (powered by Zendesk) via “Orpheus Lookup”, a small Open Source application available here. This enables relevant information about journals to be embedded in our helpdesk interface (see right hand side pane of screenshot below), facilitating the job of advising researchers on how to comply with their funders’ Open Access policies. The app also allows us to populate the relevant helpdesk ticket fields (see left hand side pane of screenshot) with one click. Information in these fields may then be processed by a Zendesk macro (also Open Source), to produce tailored auto-reply messages that can be further customised by the staff member.

In summary, our experience indicates that the benefits of integration of an institutional repository with an auxiliary database providing machine-readable representations of frequently required attributes of journals, conferences and publishers outweigh the costs of development and maintenance of the system. Other institutions or consortia interested in automating the processes of looking up and applying those attributes to repository records may benefit from hosting an instance of Orpheus.

If you are interested in more detail about the Orpheus integration, please email us on info@repository.cam.ac.uk and we will be happy to help.

Published  22 January 2019
Written by Dr Andre Sartori
Creative Commons License

Cartooning the Data Champions

Clair Castle, Librarian at the Department of Chemistry, describes how during her secondment to the Office of Scholarly Communication (OSC) as Research Data Coordinator, she collaborated with Clare Trowell, Data Champion and Marshall Librarian at the Faculty of Economics, to design some cartoons to use to advocate for the Data Champions Programme.

I have been collaborating with the OSC on various RDM (Research Data Management) activities since it was established in 2015. I was fortunate enough to be appointed on secondment to the OSC from May to October 2018, as Research Data Coordinator. One of my main responsibilities was to manage the Data Champions Programme (with which I was already involved in my department).

Data Champions are volunteers who advise members of the research community on proper handling of research data. In this, they promote good research data management (RDM) and support Findable, Accessible, Interoperable, and Re-usable (FAIR) research principles.

Data Champions form a network across different schools and departments of the University of Cambridge as well as affiliated institutes. The Data Champion Programme is open to all University members interested in research data handling, for example researchers (from PhD students to PIs), data managers, IT professionals, librarians, and data scientists.

Demonstrating the value of RDM

The Data Champions have bimonthly Forum meetings where they have the opportunity to hear speakers on RDM related topics, speak about their own RDM activities, and network. At the May 2018 Forum meeting Dr Danny Kingsley (Head of the OSC) led a stakeholder analysis exercise to try and work out: a) why RDM is of value to different stakeholders, b) their possible objections to RDM, and c) what responses a Data Champion could formulate to these objections. The idea being that if a Data Champions was stuck in a lift with one of these stakeholders, or sat next to someone at a college dinner or a meeting for example, and are having a conversation about RDM, and that person raised an objection to it, this could be rebutted with a suitable response prepared in advance.

Stakeholders included were:

  • PhD students
  • PostDocs
  • Early Career Researchers
  • Principal Investigators
  • Undergraduate students
  • Masters students
  • University administration (e.g. research grant administrators, librarians)
  • University committee structure
  • Vice Chancellor
  • Funders
  • Members of the public.

We were divided into groups, each of which represented a particular stakeholder, and wrote down our thoughts on (a)-(c) as above on post-it notes. Unfortunately we ran out of people to write anything about the members of the public as stakeholders.

I collated what was written on the post-it notes into a table and this was discussed at the following Data Champions Forum meeting in July. Ideas were invited from everybody about how we should feature this information for best usage and as practical resource for RDM advocacy.

 

One idea from Dr Lauren Cadwallader (Research Data Facility Manager) was a cartoon design for use on small postcards or on posters and she asked if anyone could draw. It was at this point that Clare Trowell stuck her hand up – as she is also an artist!

Drawing up a plan

One of the main ideas behind the cartoons was that the Research Data Team wanted to create an ‘advocacy’ resource in the Data Champions’ Google Drive. Data Champions could then use them in posters, training sessions etc. that they would design themselves. The first use for the cartoons would be on postcards to promote the Data Champions Programme and the RDM services that the Research Data team offer.

I arranged to meet Clare a couple of times for a cup of tea and a chat about what would be required, and to catch up on progress, and we established the following:

  • Timescale – Clare wanted to complete the project by the end of the Summer Vacation due to the term-time commitments there would be at Economics in the Michaelmas Term.
  • Licensing – We agreed on the Creative Commons licence CC-BY-NC-ND (which only allows others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially). Clare wanted to retain her copyright in her cartoons so she could use them for promotion on her personal website. She also wanted to prevent others from profiting from them as she did this work pro bono for the OSC. She was also concerned that without “No Derivatives” it might be possible to make disrespectful adaptations. She is not concerned about profiting from the designs herself.
  • Costs – postcards would be free to print by the University Library, where the OSC is based. Clare volunteered her services for free but we did remunerate her for the materials she used. I would be designing the postcard template as part of my usual role.
  • Workload – Clare felt that around 8 scenarios would be manageable for her to draw in the time available. I asked her to draw one more that could be specifically used to encourage people to become a Data Champion.
  • Cartoon content – we debated whether we should we have 3 or 4 ‘boxes’ in a strip. I would provide text statements for Clare to illustrate. We agreed to use speech bubbles to contain the text, as is traditional with cartoon characters when speaking.
  • Stakeholders – which should we focus on? We describe the cartoon characters we finally decided on below. We needed some of the 8 postcards to be appropriate for STEM or HASS disciplines, or both. They should therefore feature a variety of characters that could be used in different situations.

The next step was for me to identify themes from the objections to RDM and the responses to them in the stakeholder analysis exercise and to translate them into scenarios for the cartoons:

My summary of themes looked like this:

  • Fear of being scooped
  • Unable to share
  • Unwilling to share
  • Time and effort
  • Cost
  • Waste of time

Meet the cast

Literally as we were talking, Clare started drawing and we eventually came up with a range of characters that we took from the stakeholder analysis exercise results. We grouped post-docs and early career researchers together, and the PhD and Masters students together, in order to rationalise the numbers involved. We left out undergraduates and funders, as they aren’t a priority for advocacy at the moment. (Please note this image is licensed CCBY-NC-ND, attribution Clare Trowell).

Clare also invented ‘Corporate Man’ (very popular with the Data Champions and the Research Data Team!) and two Data Champion characters. Clare tried very hard to be as diverse as possible, in order to represent the Data Champions inclusively. Her inspiration for the characters has tended to come from real-life people she has encountered.

Here are some of the final scenarios I devised for Clare to illustrate. I found it was easier to include just three boxes in a strip – represented above by the number of columns. I had minimal space for text so I needed to be quite concise, as well as having to imagine scenarios that would be immediately understood. This was challenging but really enjoyable. I also received some useful feedback from Danny and Lauren at this stage.

Postcard design

The cartoons were scanned (using a high quality flatbed scanner at Economics) from the hand-drawn originals to create digital images in PDF and TIFF format. These files were too large to send to me by email so Clare made a few trips to the OSC with a memory stick!

I started off designing the postcards in Canva but this has quite a limited editing capacity (especially for cropping and resizing the images) so I moved on to using Inkscape. In contrast to Canva, this is free, open source graphic design software, which other members of the OSC had used previously. It has the advantage that anyone will be able to use this to amend the designs in future. I was given lots of advice and help but I really ended up learning as I went along due to the limited time available – a steep learning curve! Inkscape’s main output is in SVG format but images can be converted to PDF.

The nice thing about hand-drawn cartoons is that they don’t have completely straight lines, but this made it a bit difficult to orientate the drawings on the postcards. I did the best I could but I quite like the ‘hand-made’ feel of the final designs.

For the content on the reverse of the postcards I updated a version of the current Research Data postcard that the team were giving out at training sessions and other events. This provides links to sources of help and guidance on sharing research data, and to the Research Data Management website and social media accounts. It would now include a link to the Data Champions programme.

Feedback

The September Data Champions Forum meeting included a general discussion on the possible branding of the Data Champions programme. As part of this, Clare introduced her cast of characters and I shared a compilation of all the scenarios in a ‘comic strip’.

I also printed off some prototype postcards so that everyone could see what they could look like. The feedback was positive and just a few final tweaks were suggested, including creating more space on the reverse for people to write a message and an address, so it can be actually posted, and adding the headline ‘Ask a Data Champion’.

Cartoons as an advocacy tool

The final designs were just about ready in time for the beginning of the new academic year when we knew Data Champions would be inducting new students and staff and doing RDM training in their institutions. I uploaded the designs to the Data Champions Google Drive, and numbered them from 1-9. Data Champions could then choose which they would like printed copies of and request the designs and amounts required via an online form. We sent them out in the internal post.

The initial print run was 100 of each design, most of which were sent out to Data Champions upon request. We received requests for sometimes a small number of each design or larger numbers of a few designs. We needed to make a further print run of 50 each of a couple of scenarios: “Check out this course on research data management” and the “Data Champion Wanted” designs, as they proved to be particularly popular for use at induction and training events.

The Research Data team now distributes the postcards at all RDM training sessions and, if there is a choice, they are apparently more popular with the usual, more formal research data ones, perhaps because of their more informal nature? I think colourful illustrations of people do tend to stand out more.

At forum meetings we discussed the possibility of using the cartoons in the following contexts:

  • Producing short videos that could include role-play.
  • Interactive feature on a website (e.g. objections to RDM as a word cloud/speech bubbles, hover over an objection to RDM to see a rebuttal for it)
  • Memes on social media.
  • Insert postcard in the welcome packs for students or as a flyer, and on Powerpoint slides for use in foyers/on TV screens.
  • Using the #askadatachampion Twitter hashtag alongside cartoons.
  • Pokemon-like game – collect all the different cards!
  • Animation with cartoons, potentially for use on the OSC YouTube channel. See Powtoon and Adobe Character Animator which creates moving images from 2D drawings for ideas.

Outcomes

Cartooning in the world of libraries and publishing is increasing; one example is the cartoon abstract of the Research Support Ambassador programme at Cambridge University paper written by Claire Sewell and Danny Kingsley. As well as drawing the cartoons for the Data Champion postcards, Clare has drawn one for use by the OSC to promote the digitisation of theses at the University. Cartoons and drawings offer an interesting alternative to the traditional, perhaps more formal ways of communicating.

This project has proved to be an innovative and fun way for the Research Data team/the OSC to collaborate with its stakeholders, and to promote the Data Champions programme and theses digitisation. One significant outcome has been the role of the cartoons in the wider discussion of branding by the OSC that followed, and which is ongoing.

There were challenging issues around the technical side of designing the cartoons but this can be improved upon in future. The Data Champions will soon have an impressive set of designs they can use to promote their RDM activities.

I thank Lauren for steering me through the process and her and my OSC colleagues for imparting their Inkscape skills. I also thank Clare for being such a good collaborator and allowing us to use her talents to create these eye-catching postcards.

NOTE: All the cartoons are available on the RDM website.

Published 10 January 2019
Written by Claire Castle, with contribution from Clare Trowell
Creative Commons License

Moving online: training librarians in 2018

As we move into 2019 it is a good time to look back at another year spent training the library community, both in Cambridge and more widely. Over the last 12 months, the Office of Scholarly Communication has held nearly 50 training sessions for Cambridge staff on topics ranging from navigating copyright issues to the mechanics of the publishing process.  

Face to face

We have continued to deliver high-quality face-to-face training sessions on many topics. Sometimes sessions just work better when participants are all together in a room, especially if there are a lot of activities. For example, our sessions looking at Research Data Management and Data Management Plans are designed to be interactive and so wouldn’t really work in any other format. Feedback from sessions tells us that participants really value the chance to meet other librarians and hear their perspectives on things.

Cambridge has more than 100 libraries including faculties, departments, colleges and connecting institutions. Many staff do not get to meet each other unless working on a specific project and even working in the same university it can be hard to avoid becoming too focused on local issues. Attending workshops and other training sessions allows conversations to happen and several people have told us that they really value the chance to connect with their colleagues. 

Webinars to the rescue

Of course, librarians are very busy people so sometimes it’s just not possible for them to attend sessions in-person. Working in small teams often means that staff are unable to leave the library to go to training, especially when travel time and family commitments are factored into the equation.

To help with this we introduced webinars as a delivery method in 2017. This means that staff can either attend training sessions remotely or catch up with a recording.  Because of the success of this project we have continued to deliver sessions via webinar in 2018 and feedback from attendees tells us we are doing something right! Several people have commented that they have attended sessions online which they would otherwise not have been able to make but others have had some suggestions for improvement.

It can be hard to carve time out a busy schedule to attend even an hour-long webinar so there needs to be some incentive like an activity so people get the benefit of attending live. We have taken this on board and tried to build in interactive elements where appropriate. The main lesson we have learnt about webinars is that they are particularly useful for information delivery sessions which would usually involve someone standing at the front of the class delivering a talk. People can easily listen to this at their desk and/or ask questions through the webinar chat box without having to leave work.

Most of these webinars are shared with a Cambridge audience only but a few have been released more widely such as our talk on How to Spot a Predatory Publisher. As discussed in our previous post on advertising videos we have discovered that naming our content something that people are likely to Google is a great way to increase hits! 

Increasing discoverability

As we offer more and more webinars we are starting to think about the best way to collate and share these. Although they can be useful resources, people need to know where to find them without having to hunt around. One of our priorities for 2019 is to gather both our webinars and online resources together to create a mini-hub where library staff can go to find more information.

These resources include webinar recordings but also the results of two other training projects from 2018: our Research in 3 Minutes videos and our Scholarly Communication Information Booklets. Research in 3 Minutes in a series of short videos which outline basic concepts in scholarly communication. Most of these areas can be quite complicated and terminology laden and these videos aim to provide an accessible introduction. They can also be uploaded for display on screens around the library or on other webpages to engage users. We started to create Information Booklets when we realised that all librarians love a handout (at least in our experience!).

These four-page booklets can be viewed online or printed out and offer a more in-depth look at areas we are often asked about, for example what exactly is a Creative Commons license? There are six booklets in the series so far, covering everything from the publication lifecycle to academic social networking and we aim to add more in 2019. 

Online learning

One of our biggest forays into online learning took place with the Research Support Ambassador programme. This is an annual programme aimed at educating library staff on the core elements of research support and in previous years it has been run both face-to-face and via webinar.

This year we decided to do something different and used Moodle to create a completely online course. Participants were able to work though modules including video content, quizzes and discussions to test their understanding of the concepts. Each module was assessed by an activity which allowed learners to put their new knowledge into practice by undertaking a research support task. Examples of this included assessing a data management plan and attempting to spot a predatory publisher.

Overall the course was completed by 20 participants who gave us a lot of positive feedback on the format as well as suggestions for improvements. In the next few years this is something we would like to expand on, perhaps to those outside Cambridge… 

Beyond the University

That doesn’t mean we have neglected non-Cambridge librarians this year. In March our Research Support Skills Coordinator delivered two well-attended sessions on Moving Into Research Support with CILIP. The original session was so popular that we had to add a second and attendees came from around the UK to hear how they could get involved in this exciting new area. There was also a return visit to CILIP HQ in London for their 2018 Careers Day where attendees were introduced to the wonders of working in research support (including dealing with penguin poop and breaking the internet).

We also contributed to a range of other events such as LILAC 2018 and Dawson Day held in the summer – both of which gave us a chance to talk about the need for training in scholarly communication literacy for library staff. 

All in all 2018 has been a very busy year for training but we will not be slowing down in 2019. We have plans to expand our online training offer and deliver even more face-to-face sessions for our community. Who knows what this blog will contain this time next year? Readers had better stay tuned to find out! 

Published 8 January 2019
Written by Claire Sewell
Creative Commons License