Monthly Archives: September 2016

Taking a Principled stance – the Scholarly Commons

It only rains about 10 days a year in San Diego. And Tuesday was one of them. In a rooftop room on campus in San Diego at UCSD, a group had gathered for the FORCE11 Scholarly Commons workshop. The workshop brought together members of the Scholarly Commons working group, who hail from around the world and come from the broad scholarly commons. The Scholarly Commons is an idea to help define the future of research communication. The goal is to promote the best research and scholarship possible through rapid and wide dissemination to all who need or want it.

FORCE stands for the Future of Research Communications and eScholarship and is an organisation (or community) open to anyone interested in these issues. The group consisted of researchers from multiple disciplines, communicators, programmers, and a couple of librarians. This is the unusual and powerful thing about FORCE11 – the diversity of its members. Someone actually remarked: ‘you know, there probably should be a few more librarians here’ which is something you don’t often hear at meetings about open access issues. Usually librarians are delighted if a real live researcher turns up.

We were meeting to discuss the draft of 18 Principles of the Commons – an attempt to define what the community considers the attributes and behaviours of a person who is fully participating in research. The Principles are broadly separated into four major themes of being Open, Equitable, Sustainable and Research & Culture Driven.

FORCE11 works openly and tries to be as accessible as possible so there were full and open notes being collaboratively taken and the Twitter hashtag was #futurecommons.

The workshop was very hands-on, and expertly moderated by Jeroen Bosman and Bianca Kramer who are the power behind the excellent 101 Innovations in Scholarly Communication project. As their ‘wheel’ of tools identifying tools available across the research life stages and through time demonstrates, it is becoming increasingly difficult to navigate the new research space. Indeed that is part of the rationale behind this Scholarly Commons project. It is an attempt to take stock and make sense of what we, the community, want to see in an open and accessible future.

Despite having fewer than 40 people we managed to have multiple activities running concurrently with several ‘unworkshops’. Everything was fed back into the group, and there was a very broad range of discussions, agreements and ideas. To prevent this blog being a tome, I am only going to cover here a couple of areas that were discussed.

Standing on the shoulders of giants

Due to a flight delay I was only able to catch the end of the Sunday evening welcome reception where we were asked to reflect on the 18 draft Principles plastered on the walls and decide which we agreed with and which we did not (or had an issue with). As I scanned through I was struck by the overall similarity they had with Robert Merton’s 1942 publication, The Normative Structure of Science where he proposed that science operated to four ‘norms’, of Universalism, Communalism, Disinterestedness and Organised Skepticism.

Screen Shot 2016-09-22 at 11.53.53I mentioned this in an early discussion and was somewhat chuffed that the group really did take this on board – to the extent that one unworkshop group worked on updating the norms to reflect today’s situation.

As an aside – the challenge with having people coming together from multiple research areas is everyone brings their a priori biases with them. People tend to see the problem through their own lens, so have different ways of approaching the problem. For the group to agree that this perspective was a good one was personally very validating.

Considering outreach as part of the research lifecycle

I the first unworkshop I joined we discussed how research-centred the Principles are – they did not consider the importance of outreach. Given the impenetrable nature of the language in many academic papers, we agreed that making something Open Access facilitates outreach but is not outreach itself.

The discussion moved to idea about what researchers could do to help with outreach – even if they themselves did not want (or were unable) to do it. These are fairly simple including providing supplementary material that is accessible in terms of the descriptive language used (no jargon), potentially providing the information in a different language to English, and ensuring the license under which it is made available is open.

We proposed that the Commons should facilitate outreach and have the outreach in mind even if the researcher themselves does not generate the outreach. There had been a comment earlier in the Equity discussion that noted “Each part of the research cycle is equally valid and none should not be preferenced over the other.” Our discussion concluded that outreach (for the lay public) should be considered to be part of the research process and equally valued.

It should be noted that we are not discussing paper-related activities here. Making the paper open access or tweeting a link to the paper doesn’t count. This is about sharing the information in an understandable manner outside the Academy.

Tool mapping

thumb_IMG_2188_1024The workshop, as mentioned, was very hands-on. By that I mean we did several ‘craft activities’ involving dots, glue, sticky tape and scissors. One of these activities involved ranking various tools for research against the four themes of the Principles, deciding whether they were in alignment with them (green), in opposition with them (red) or in-between (yellow).

Screen Shot 2016-09-22 at 13.12.47We then placed these assessments on the windows under the part of the research lifecycle they related to, and ordered them. The most Principle-friendly tools were up high, and the least down low.


thumb_IMG_2191_1024 We then did an activity where we tried to trace the path of our own discipline in terms of the tools our disciplines tended to use. This exercise was an attempt to see if there were any discernable patterns about where some disciplines tend to align or otherwise with the Principles. While the sample size for each discipline was too small to really come to any conclusions, this exercise did open up ideas for a way of disseminating the Principles.

The Principles as an Innovation

This is where another of my disciplinary perspectives comes into play. If we accept that the Principles are themselves an ‘innovation’ – in that they are “an idea, practice, or object that is perceived as new by an individual or other unit of adoption”, then we can look to Everett Rogers Diffusion of Innovations first published in 1962, and now in its 5th edition. You might not have heard of him but you know about his work – Rogers was the person who coined the idea of ‘early adopters’, late adopters’ and ‘laggards’.

Amongst lots of interesting insights about why people adopt new ideas, Rogers came up with five ways to evaluate an innovation which will determine the success or otherwise of its adoption. These are judged as a whole and are interrelated:

  • Relative advantage – the perceived efficiencies gained by the innovation relative to current tools or procedures
  • Compatibility with the pre-existing system
  • Complexity or difficulty to learn (it needs to be easy)
  • Trialability or testability without risking the current system
  • Observed effects.

It is the second point which is the interesting one here – ‘Compatibility with the pre-existing system’. The reason why this is relevant is we are not talking about one system when we discuss scholarship – there are a myriad of systems. There is no ‘one solution’. If we are to try and implement something like the Principles across the academy, we will need to do it along disciplinary lines. (Disclosure – this happens to be the conclusion of my 2008 PhD thesis on the adoption of open access across disciplines).

Disciplinary dissemination

This leads us to the question of audiences for the Principles. Ideally we would have institutions signing up to them, pledging that they will work with their research community to work in this manner. But this is unrealistic currently due to the diverse nature of research institutions. But there might be a way to have funders sign up, because often funding is given within disciplinary restraints. This is doubly the case because funders (in the UK, Australia and the US at least) are increasingly using an ‘Impact narrative’ and the Principles offer a way to practically identify and reward impact behaviour.

And we are not coming from a standing start. We can build on the work done by Jeroen Bosman and Bianca Kramer in their 101 Innovations in Scholarly Communication project. There were over 20,000 responses to their survey of innovation use and this allows a detailed mapping of disciplinary behaviours. If we the further map those findings against an assessment of the research tools being used at a disciplinary level and whether they are aligned with the Principles, we should be able to see which disciplinary areas are already working in the Principled way. It is the funders of these disciplines that we should approach first to try and gain early adoption of the Principles. This  work would become a checklist that can reward people for the behaviours that they are already doing in this space.

A project like this would in turn open up some questions about what we need to do at a disciplinary level to help that community become more aligned with the Principles. These may require a number of approaches – Do they have the tools that work for them or do these need to be developed? Is there a cultural reason why this discipline is not engaging? In answering these questions we come up with the answer to the question: What does a Scholarly Commons researcher look like in this discipline?  Until we have some evidence of where these areas are we are effectively stabbing in the dark.

Making this happen now

In a different unworkshop we talked about how the nature of the Principles themselves went against the idea of being  inclusive because we are potentially creating a binary situation – either you are following the Principles or not.  What we really need to do, we agreed, is not reject people for acting in ways that are not totally in line with the Principles, rather reward behaviour that supports the Principles.

In order to facilitate this, we designed a series of ‘Decision Trees’ to help researchers be as open as they can. This is a recognition that researchers are working within a complex ecosystem. With all the will in the world, if there is not an Open Access journal available to you in your field, you cannot publish in one.

thumb_IMG_2196_1024The easiest part of the research lifecycle was to tackle was publishing, in terms of choosing a publication outlet. The decision tree allows for people who cannot publish in an Open Access journal, nor afford to pay for hybrid (not something I personally recommend anyway) to still be ‘Principled’ by putting a copy of their work in a repository.

thumb_IMG_2197_1024Our discussion about data was more complex. For a start, there is a question about whether the data is digital or not. As we discussed it, our draft tree became incredibly complex so we created two separate flows. The Data 1 decision tree says to someone who has analogue data and no funds to digitise, that as long as they put in some information in their paper about how to contact them for the supporting data, then they have met the spirit of the Principles to the best of their ability.

thumb_IMG_2198_1024While we know the gold standard for data sharing is to have the data (with well defined metadata) available openly in a non proprietary repository with a DOI, for various reasons this is not always possible. We should not sanction a researcher because they are unable to meet that (very high) standard. The Data 2 tree shows that data that is in a repository under embargo without a DOI is discoverable in a way it would not be if it were in a desk drawer – so that is, again, within the spirit of the Principles. We need to consider the ‘close enough’ option as being a valid one, at least in the implementation stage of the Principles.

We agreed that in some areas of the research lifecycle that a list of tools that could help would be of more use than a decision tree. Time restraints meant there are a couple of areas of the lifecycle which still need consideration (and we need to do some decision tree design work!), but generally the group agreed that this was probably quite useful.


When it comes to the Principles themselves, we are still working on it. We did however agree that we thought the Principles were something worth doing, and that they were more or less something we can start working with (and on – they are likely to be dynamic). One suggestion was that we call them Scholarly Commons Principles 1.0 – a reference to this being the first version of possibly many. There are plans for several subgroups to pitch for funding to do some deeper work in some areas. So it is an ongoing project, but a substantial one.

There are some troopers in the Scholarly Communication community. Several people at our workshop had ‘done the double’ – attending  the SciDataCon 2016 conference and associated meetings over eight days in Denver last week and then coming to this event. The gruelling pace was starting to show by the end of the last day of our workshop.

thumb_IMG_2177_1024You know you have been on a very short visit when you fly back with the same in-flight crew as your outward bound journey. One of them even recognised me and commented on how quickly I was returning. So while the trip was an exhausting few days, it was productive and worthwhile. And it was really nice to smell eucalypt trees (rather bizarrely) and do laps in an outdoor pool – things I have not done since moving to the UK.

Published 22 September 2016
Written by Dr Danny Kingsley
Creative Commons License

Cambridge University spend on Open Access 2009-2016

Today is the deadline for those universities in receipt of an RCUK grant to submit their reports on the spend. We have just submitted the Cambridge University 2015-2016 report to the RCUK and have also made it available as a dataset in our repository.


Cambridge had an estimated overall compliance rate of 76% with 46% of all RCUK funded papers  available through the gold route and 30% of all RCUK funded papers available through the green route.

The RCUK Open Access Policy indicates that at the end of the fifth transition year of the policy (March 2018) they expect 75% of Open Access papers from the research they fund will be delivered through immediate, unrestricted, on‐line access with maximum opportunities for re‐use (‘gold’). Because Cambridge takes the position that if there is a green option that is compliant we do not pay for gold, our gold compliance number is below this, although our overall compliance level is higher, at 76%.

Compliance caveats

The total number of publications arising from research council funding was estimated by searching Web of Science for papers published by the University of Cambridge in 2015, and then filtered by funding acknowledgements made to the research councils. The number of papers (articles, reviews and proceedings papers) returned in 2015 was 2080. This is almost certainly an underestimate of the total number of publications produced by the University of Cambridge with research council funding. The analysis was performed on 15/09/2016.


The APC spend we have reported is only counting papers submitted to the University of Cambridge Open Access Team between 1 August 2015 and 31 July 2016. The ‘OA grant spent’ numbers provided are the actual spend out of the finance system. The delay between submission of an article, the commitment of the funds and the subsequent publication and payment of the invoice means that we have paid for invoices during the reporting period that were submitted outside the reporting period. This meant reconciliation of the amounts was impossible. This funding discrepancy was given in ‘Non-staff costs’, and represents unallocated APC payments not described in the report (i.e. they were received before or after the reporting period but incurred on the current 2015-16 OA grant).

The breakdown of costs indicates we have spent 4.6% of the year’s allocation on staff costs and 5.1% on systems support. We noted in the report that the staff time paid for out of this allocation also supports the processing of Wellcome Trust APCs for which no support is provided by Wellcome Trust.

Headline numbers

  • In total Cambridge spent £1,288,090 of RCUK funds on APCs
  • 1786 articles identified as being RCUK funded were submitted to the Open Access Service, of which 890 required payment for RCUK*
  • 785 articles have been invoiced and paid
  • The average article cost was ~£2008


The average article cost can be established by adding the RCUK fund expenditure to the COAF fund expenditure on co-funded articles (£288,162.28)  which gives a complete expenditure for these 785 articles of £1,576,252.42. The actual average cost is £2007.96.

* The Open Access Service also received many COAF only funded and unfunded papers during this period. The number of articles paid for does not include those made gold OA due to the Springer Compact as this would throw out the average APC value.


In our report on expenditure for 2014 the average article APC was £1891. This means there has been a 6% increase in Cambridge University’s average spend on an APC since then. It should be noted that of the journals for which we most frequently process APCs, Nature Communication is the second most popular. This journal has an APC of £3,780 including VAT.

Datasets on Cambridge APC spend 2009-2016

Cambridge released the information about its 2014 APC spend for RCUK and COAF in March last year and intended to do a similar report for the spend in 2015, however a recent FOI request has prompted us to simply upload all of our data on APC spend into our repository for complete transparency. The list of datasets now available is below.

1. Report presented to Research Councils UK for article processing charges managed by the University of Cambridge, 2014-2015

2. Report presented to the Charity Open Access Fund for article processing charges managed by the University of Cambridge, 2015-2016

3. Report presented to the Charity Open Access Fund for article processing charges managed by the University of Cambridge, 2014-2015

4. Report presented to Jisc for article processing charges managed by the University of Cambridge, 2014

5. Open access publication data for the management of the Higher Education Funding Council for England, Research Councils UK, Charities Open Access Fund and Wellcome Trust open access policies at the University of Cambridge, 2014-2016

Note: In October 2014 we started using a new system for recording submissions. This has allowed us to obtain more detailed information and allow multiple users to interact with the system. Until December 2015 our financial information was recorded in the spreadsheet below. There is overlap between reports 5. and 6. for the period 24 October and 31 December 2015.  As of January 2016, all data is being collected in the one place.

6. Open access publication data for the management of Research Councils UK, Charities Open Access Fund and Wellcome Trust article processing charges at the Office of Scholarly Communication, 2013-2015

Note: In 2013 the Open Access Service began and took responsibility for the new RCUK fund, and was transferred responsibility for the new Charities Open Access Fund (COAF). At this time the team were recording when an article was fully Wellcome Trust funded, even though the Wellcome Trust funding is a component of COAF.

7. Open access publication data for the management of Wellcome Trust article processing charges from the School of Biological Sciences, 2009-2014

Note: Management of the funds to support open access publishing has changed over the past seven years. Before the RCUK open access policy came into force in 2013, the Wellcome Trust funds were managed by the School of Biological Sciences.

Published 14 September 2016
Written by Dr Danny Kingsley & Dr Arthur Smith
Creative Commons License

Making the connection: research data network workshop

During International Data Week 2016, the Office of Scholarly Communication is celebrating with a series of blog posts about data. The first post was a summary of an event we held in July. This post looks at the challenges associated with financially supporting RDM training.

corpus-main-hallFollowing the success of hosting the Data Dialogue: Barriers to Sharing event  in July we were delighted to welcome the Research Data Management (RDM) community to Cambridge for the second Jisc research data network workshop. The event was held in Corpus Christi College with meals held in the historical dining room. (Image: Corpus Christi )

RDM services in the UK are maturing and efforts are increasingly focused on connecting disparate systems, standardising practices and making platforms more usable for researchers. This is also reflected in the recent Concordat on Research Data which links the existing statements from funders and government, providing a more unified message for researchers.

The practical work of connecting the different systems involved in RDM is being led by the Jisc Research Data Shared Services project which aims to share the cost of developing services across the UK Higher Education sector. As one of the pilot institutions we were keen to see what progress has been made and find out how the first test systems will work. On a personal note it was great to see that the pilot will attempt to address much of the functionality researchers request but that we are currently unable to fully provide, including detailed reporting on research data, links between the repository and other systems, and a more dynamic data display.

Context for these attempts to link, standardise and improve RDM systems was provided in the excellent keynote by Dr Danny Kingsley, head of the Office of Scholarly Communication at Cambridge, reminding us about the broader need to overhaul the reward systems in scholarly communications. Danny drew on the Open Research blogposts published over the summer to highlight some of the key problems in scholarly communications: hyperauthorship, peer review, flawed reward systems, and, most relevantly for data, replication and retraction. Sharing data will alleviate some of these issues but, as Danny pointed out, this will frequently not be possible unless data has been appropriately managed across the research lifecycle. So whilst trying to standardise metadata profiles may seem irrelevant to many researchers it is all part of this wider movement to reform scholarly communication.

Making metadata work

Metadata models will underpin any attempts to connect repositories, preservation systems, Current Research Information Systems (CRIS), and any other systems dealing with research data. Metadata presents a major challenge both in terms of capturing the wide variety of disciplinary models and needs, and in persuading researchers to provide enough metadata to make preservation possible without putting them off sharing their research data. Dom Fripp and Nicky Ferguson are working on developing a core metadata profile for the UK Research Data Discovery Service. They spoke about their work on developing a community-driven metadata standard to address these problems. For those interested (and Git-Hub literate) the project is available here.

They are drawing on national and international standards, such as the Portland Common Data Model, trying to build on existing work to create a standard which will work for the Shared Services model. The proposed standard will have gold, silver and bronze levels of metadata and will attempt to reward researchers for providing more metadata. This is particularly important as the evidence from Dom and Nicky’s discussion with researchers is that many researchers want others to provide lots of metadata but are reluctant to do the same themselves.

We have had some success with researchers filling in voluntary metadata fields for our repository, Apollo, but this seems to depend to a large extent on how aware researchers are of the role of metadata, something which chimes with Dom and Nicky’s findings. Those creating metadata are often unaware of the implications of how they fill in fields, so creating consistency across teams, let alone disciplines and institutions can be a struggle. Any Cambridge researchers who wish to contribute to this metadata standard can sign up to a workshop with Jisc in Cambridge on 3rd October.

Planning for the long-term

A shared metadata standard will assist with connecting systems and reducing researchers’ workload but if replicability, a key problem in scholarly communications, is going to be possible digital preservation of research data needs to be addressed. Jenny Mitcham from the University of York presented the work she has been undertaking alongside colleagues from the University of Hull on using Archivematica for preserving research data and linking it to pre-existing systems (more information can be found on their blog.)

Jenny highlighted the difficulties they encountered getting timely engagement from both internal stakeholders and external contractors, as well as linking multiple systems with different data models, again underlining the need for high quality and interoperable metadata. Despite these difficulties they have made progress on linking these systems and in the process have been able to look into the wide variety of file formats currently in use at York. This has lead to conversations with the National Archive about improving the coverage of research file formats in PRONOM (a registry of file formats for preservation purposes), work which will be extremely useful for the Shared Services pilot.

In many ways the project at York and Hull felt like a precursor to the Shared Services pilot; highlighting both the potential problems in working with a wide range of stakeholders and systems, as well as the massive benefits possible from pooling our collective knowledge and resources to tackle the technical challenges which remain in RDM.

Published 14 September 2016
Written by Rosie Higman
Creative Commons License