All posts by Office of Scholarly Communication

New to OA? Top tips from the experts

We have a fantastic community in the Scholarly Communication space. And this is one of the clear themes that emerged from a recent exchange on the UKCORR discussion list. The grandly named UK Council of Research Repositories is a self-organised, volunteer, independent body for repository managers, administrators and staff in the UK.

The main activity for UKCORR is a closed email list which has 570 members and is very active. Questions and discussions range from queries about how to interpret specific points of OA policy through to technical advice about repositories.

Recently, the OSC’s Arthur Smith (the current Secretary of UKCORR), posed the first ‘monthly discussion’ point, asking the group two questions:

  • What do you wish you were told before you started your job in repository management/scholarly communication?
  • What are your top three tips for someone just starting?

What followed was a flurry of emails full of great advice. Too good not to share – hence this blog. In summary:

  1. This is a varied and complex area
  2. Open access is bigger than mandates
  3. Things change fast in scholarly communication
  4. Don’t panic
  5. Work with your academic colleagues
  6. The OA community is strong and supportive

Top tips for someone just starting in Scholarly Communication

1. This is a varied and complex area

It’s complicated! Terminology, changing guidance and policies, publisher’s rules… everything is complicated and it takes time to learn it all.

You will experience A LOT of frustration (with publishers, financial constraints, lack of policy alignment, issues with interoperability, ) but there will be moments when it all comes together and you realise you have made a difference to someone and it is all worthwhile.

You’re not mad for wondering why open access policies/dates etc. are not easily found…

How varied and exciting the role is, with requirements (and opportunities) to develop expertise in diverse areas: communication/advocacy, copyright, systems, researcher training, project and team management, budget management…to name but a few.

To remember that this is an industry we have not traditionally been involved in, that it is a constantly changing landscape, that the community is incredibly supportive and endlessly useful, that Sherpa Romeo is still vital, that publishers really vary in their responses to open access – from behemoths to start-ups, and that everyone should back the collaborative effort behind the Scholarly Communications Licence!

2. Open access is bigger than mandates

Remember the bigger picture – open access/open research should not be about compliance; don’t allow yourself to become jaded.

Remember that it is not all just about compliance (the REF). Yes, it is concentrating researchers minds wonderfully at the moment but Open Access/scholarly communications should be about selling the benefits– the carrot not the stick.

Efface mandates & policy when possible – while the REF (along with funder and institutional) mandates are powerful driving forces, some people are not motivated by them, and OA and Open Science are bigger and better than any mandates.

It’s not all about compliance…

It’s not all about the REF.

3. Things change fast in scholarly communication

It’s not finished yet – we’re still building it and nothing is set in stone, so what do you think?

My advice is be adaptable – change is good. This field is rapidly evolving which demands that you remain flexible. What was true yesterday may not be applicable tomorrow.

It is a fluid constantly-changing field to be involved in and it will continue to evolve, so enthusiasm (or nosiness) and an enquiring mind helps

Identify ways to keep up-to-date as it is a rapidly evolving area and it’s impossible to keep on top of everything

Keep the big picture alive alongside the ‘how-to’, operational aspects. Reflect this in your communications.

Don’t be afraid to say you don’t know something – a lot of things in this area are based on interpretation of policies etc

Stay passionate (even when the details are dragging you down).

There is a lot more to it than meets the eye – and that is what is appealing – variety and challenge.

Don’t be afraid to try and change things.

4. Don’t panic!

Open Access Emergencies are very rare. If you’re sent a takedown notice, hide the record immediately and then think about what to do (I’ve had two in something like 6 years, they’re pretty rare). Other than that, very few things are actually urgent and you can afford to spend a bit of time thinking about them.

You’re not going to get everything right – mistakes can be made and for the most part easily rectified (in my position at least!)

Don’t worry about asking questions– Green? Gold? Need some context? Get some context!

5. Work with your academic colleagues

Recognise that some of your best allies will be researchers, although they will often be silent partners working away in the background. It’s easy to moan that they always get it wrong, but no amount of lecturing about policies will ever be as effective as a casual conversation between two researchers over lunch. Catalysing those discussions is what we should be aiming for.

Your academics do not care about the vagaries of policy and probably weren’t listening when you told them. Keep the message very simple. If a specific funder is more complicated you may best off targeting those authors directly with an additional message that explains the difference.

Take time to understand the daily and yearly calendar of academic staff to better understand their pressures.

Engage academics in conversations – for me that is the most interesting and rewarding part of the role.

Be confident, you know what you’re doing. And if you don’t? Find out-  you’ve checked the embargo/copyright regardless of what the academic might want you to do!

Customer focus is important – support rather than appear to police (even though we might be doing a bit of policing).

You have to remember that even if you are relatively new, that you will probably know more than the academics/researchers themselves, so don’t panic when you don’t know/understand something they ask/request. They are usually fine with the standard “I’ll get back to you….” to give you time to find out. Plus, a lot of them are happy that you are dealing with it so they don’t have to.

6. The OA community is strong and supportive

It takes time to build knowledge, so build your networks.

Make use of your colleagues’ expertise – it’s ok not to know everything about everything and you’ll become a stronger team.

Engage on Twitter – it’s where I find a lot of useful resources, updates and share ideas.

Join UKCORR (but I would say that).

You are part of a community that works together – UKCORR is a great platform for discussion, keeping up with news (eg the release of multiple REF2021 related guidance papers within a few days of each other) and finding out the answers to questions.

Network as much as you can; UKCORR is a fantastic community.

Use the support networks that are available –Colleagues/Local Groups/UKCoRR/ARMA – people are genuinely helpful and supportive and repetition of questions does not offend.

Join the Open Access Tracking Project or at least subscribe to notifications. I read the email digest every morning, there is always plenty going on.

7. General advice

The validation queue will vary rarely reach zero. Your academics are publishing all the time. Don’t try to get the queue to zero, for that way madness lies. Instead set a time period (e.g. 2 weeks) and aim to have nothing take longer than that to validate. Don’t worry if this slips a bit during the busy times.

Don’t be intimidated by copyright – get expert advice when you need it, but most re-use & sharing rights are written down somewhere (in the agreement to publish, or in a publisher’s pages).

Don’t forget the Arts & Humanities – much of the lingo (& policy) in OA, e.g. “pre-print”, PubMed/EPMC deposits, etc. comes from the STEM side of the Two Cultures, and the Humanities tradition can be slightly different (for one thing, more publishing in books).

I’m also happy to admit that I was rather overwhelmed by acronyms and abbreviations. It took me an age to figure out that CRIS was Current Research Information System. Don’t be afraid to stop someone if they’re using a term that you don’t know.

Learn a little bit about code and the underpinnings of your platform so you can communicate more effectively with developers.

If you have the opportunity to learn how the technical infrastructure works, eg coding, APIs, go for it. This is on my wish list – so often I can’t tell if a development/improvement hasn’t happened because it’s technically not possible or if it’s for other reasons.

Published 20 August 2018
Compiled by Dr Danny Kingsley from responses amongst the UKCORR community
Creative Commons License

‘No free labor’ – we agree.

[NOTE: The introductory sentence to this blog was changed on 27 June to provide clarification]

Last week members of the University of California* released a Call to Action to ‘Champion change in journal negotiations’ which references the April 2018 Declaration of Rights and Principles to Transform Scholarly Communication.  This states as one of the 18 principles:

No free labor. Publishers shall provide our Institution with data on peer review and editorial contributions by our authors in support of journals, and such contributions shall be taken into account when determining the cost of our subscriptions or OA fees for our authors.”

Well, this is interesting. At Cambridge we have been trying to look at this specific issue since late last year.

The project

Our goal was to have a better understanding of the interaction between publisher and researcher. The (not very imaginatively named) Data Gathering Project is a project to support the decision making of the Journal Coordination Scheme in relation to subscription to, and use of, academic journal literature across Cambridge.

What we have initially found is that the data is remarkably difficult to put together. Cambridge University does not use bibliometrics as a means of measuring our researchers, so we do not subscribe to SciVal, but we have access to Scopus. But Scopus does not pick up Arts and Humanities publications particularly well, so it will always be a subset of the whole.

Some information that we thought would be helpful simply isn’t. We do have an institutional Altmetric account, so we were able to pull a report from Altmetric of every paper with a Cambridge author held in that database.  But Altmetric does not give a publisher view – we would have to extract this using doi prefixes or some other system. 

Cambridge uses Symplectic Elements to record publications from which, for very complicated reasons, we are unable to obtain a list of publishers with whom we publish. As part of the subscription we have access to the new analysing product, Dimensions. However, as far as we have managed to see, Dimensions does not break down by publisher (it works at the more granular level of journal), and seems to consider anything that is in the open domain (regardless of licence) to be ‘open access’. So figures generated here come with a heavy caveat.

We are also able to access the COUNTER usage statistics for our journals with the help of  the Library eresources team. However these include downloads for backfiles and for open access articles, so the numbers are slightly inflated, making a ‘cost per download’ analysis of value against subscription cost inaccurate.

We know how much we spend on subscriptions (spoiler alert: a lot). We need to take into consideration our offsetting arrangements with some publishers – something we are taking an active look at currently anyway.

Reaching out to the publishing community

So to supplement the aggregated information we have to hand, we have reached out to those publishers our researchers publish with in significant quantities to ask them for the following data on Cambridge authors: Peer Reviewing, Publishing, Citing, Editing, and Downloading.

This is exactly what the University of California is demanding. One of the reasons we need to ask publishers for peer review information is because it is basically hidden work. Aggregating systems like Publons do help a bit, although the Cambridge count of reviewers in the system is only 492 which is only a small percentage of the whole. Publons was bought out by Clarivate Analytics (which was Thompson Reuters before this and ISI before that) a year ago. We did approach Clarivate Analytics for some data about our peer reviewing, but declined to pay the eye watering quoted fee.

What have we received?

Contrary to our assumptions, many of the publishers responded saying that this information is difficult to compile because it is held on different systems and that multiple people would need to be contacted. Sometimes this is because publishers are responsible for the publication of learned society journals so information is not stored centrally.  They also fed back that much of the data is not readily available in a digestible format. 

Some publishers have responded with data on Cambridge peer reviewers and editors, usage statistics, and citation information. A big thank you to Emerald, SAGE, Wiley, the Royal Society and eLife. We are in active correspondence with Hindawi and PLOS. [STOP PRESS: SpringerNature provided their data 30 minutes after this blog went live, so thanks to them as well].

However, a number of publishers have not responded to our requests and one in particular would like to have a meeting with us before releasing any information.

Findings so far

The brief for the project was to ‘understand how our researchers interact with the literature’.  While we wrote the brief ourselves, we have come to realise it is actually very vague. We have tried to gather any data we can to start answering this question.

What the data we have so far is helping us understand is how much is being spent on APCs outside the central management of the Office of Scholarly Communication (OSC). The OSC manages the block grants from the RCUK (now UKRI) and the Charities Open Access Fund, but does not look after payments for open access for research funded by, say the Bill and Melinda Gates Foundation or the NIH. This means that there is a not insignificant amount of extra expenditure on top of that  coordinated by the OSC. These amounts are extremely difficult to ascertain as observed in 2014.

We already collect and report on how much the Office of Scholarly Communication has spent on APCs since 2013. However some prepayment deals makes the data difficult to analyse because of the way the information is presented to us. For example, Cambridge began using the Wiley Dashboard in the middle of the year with the first claim against it on 6 July 2016, so information after that date is fuzzy.

The other issue with comparing how much a publisher has received in APCs and how much the OSC has paid (to determine the difference) is dates. We have already talked at length about date problems in this space. But here the issue is publisher provided numbers are based on calendar years. Our reporting years differ – RCUK reports from April to March and COAF from October to September, so pulling this information together is difficult.

Our current approach to understanding the complete expenditure on APCs, apart from analysing the data being provided by (some) publishers, is to establish all of the suppliers to whom the OSC has paid an APC and obtain the supplier number. This list of supplier numbers can then be run against the whole University to identify payments outside the OSC.

This project is far from straightforward. Every dataset we have will require some enhancement. We have published a short sister post on what we have learned so far about organising data for analysis. But we are hoping over the next couple of months to start getting a much clearer idea of what Cambridge is contributing into the system – in terms of papers, peer review and editorial work in addition to our subscriptions and APCs. We need more evidence based decision making for negotiation.

Footnote

* There has been some discussion in listservs about who is behind the Call to Action and the Declaration. Thanks to Jeff MacKie-Mason, University Librarian and Professor, School of Information and Professor of Economics at UC Berkeley, we are happy to clarify:

  • The Declaration is by the faculty senate’s library committee – University Committee on Library and Scholarly Communication (UCOLASC)
  • The Call to Action is by the University of California’s Systemwide Library and Scholarly Information Advisory Committee, UCOLASC, and the UC Council of University Librarians, who: “seek to engage the entire UC academic community, and indeed all stakeholders in the scholarly communication enterprise, in this journey of transformation”.

Published 26 June 2018 (amended 27 June 2018)
Written by Dr Danny Kingsley & Katie Hughes
Creative Commons License

Observations on a data gathering project

The Office of Scholarly Communication provides information, advice and training on research data management.  So when faced with running a research project that involves a considerable amount of data, it is telling to see if we can practice what we preach.

This blog post is a short list of how we have approached managing data for analysis. Judging by our colleagues’ faces when we described some of the advice here, this is blindingly obvious to some people. But it was news to us, so we are sharing it in case it is helpful to others.

Organising and storing the data

As is good practice we have started with  a  Data Management Plan. Actually we ended up having to write two, one for the qualitative and one for the quantitative aspect of the project. 

We have also had to think through where the data is being stored and backed up. All of the collected data is currently being stored on a shared Cambridge University Google Drive where only invited users with a Cambridge University email address can view the data. This is because it can handle Level 2 confidential information and was accessible on and off campus. Some of the data is confidential and publishers have asked us to keep it private.

The data is also stored on a staff member’s laptop computer in her Documents folder (the laptop is password protected) that is backed up by the Library on a daily basis. There is a second storage place on the Office of Scholarly Communication’s (OSC) Shared Drive to ensure that there are two backups in two different locations.

One dataset has proven difficult to use as it is 48MB and Google Drive does not seem to be able to handle that file size well.

Each dataset was renamed with the file naming syntax that the OSC uses. This includes a three letter prefix at the beginning (e.g. RAW for raw data), a short description, then a version, and finally the date that the data was received. Underscores separate each section and there are no spaces. An example is MEM_JCSBlogData_V1_20180618.docx

To organise and summarise the metadata, we have created two spreadsheets. One is a logbook that records the name of the file, a description of the data, size of the file, if it is confidential, and what years it covers. The second spreadsheet records what information each dataset covered, i.e. Peer Review, Editing, Citing, APCs, and Usage. The spreadsheet also records correspondence with the publishers.

Assessing our data

At first glance, we were unsure whether we could do cross comparisons between publishers with the data that we had collected. Although most datasets were provided in Excel (with the exception of the Springer 2017 report on gold open access and eLife), they were formatted differently and covered different areas.

Dr Laurent Gatto, one of Cambridge’s Data Champions, very kindly agreed to meet with us and look over the data that we had collected so far. He suggested a number of ways that we could clean up the data so that we could do some cross comparison analysis. Somewhat to our surprise he was generally positive about the quality and analysability of the data that we had collected.

Cleaning up data for analysis

After an initial look at the data, Laurent gave us some excellent suggestions on managing and analysing the data. These are summarised below:

  • Have a separate folder where the original datasets will be saved. These files will remain untouched.
  • When doing any re-formatting, a new file will be created using the same naming convention, but updating the version. A record of any changes to the dataset will need to be recorded in a spreadsheet.
  • Ensure that all of the headers are uniform across the different spreadsheets, to allow analysis across datasets. Each header must be the same down to the last lowercase letter and cannot include any spaces
  • Dates must also be uniform using Year-Month-Day format
  • Only the first row of a spreadsheet can include the header. Having more than one row with header information will cause problems when you are starting to code.
  • Create a readme file where every header will be recorded with a short description.

Next steps

After speaking with Laurent we are more optimistic about the data that we have collected than we were before. We were concerned that there was not enough information to do analysis across publishers; however, we are more confident that this is not the case. As we start the analysis it will also give us a better understanding of what data is missing.

We will provide an update as we close in on our findings.

Published 26 June 2018
Written by Katie Hughes & Dr Danny Kingsley
Creative Commons License