Cambridge expenditure on APCs in 2014

Cambridge (along with many other institutions) were recently approached by Jisc to report on our article processing charges (APC) payments for 2014  as part of Jisc’s APC data collection project to address the Total Cost of Ownership of scholarly communication. Stuart Lawson, who is compiling these datasets has made the files available on Figshare.

A couple of caveats – This dataset only contains APCs which were paid centrally; there will be many other APCs paid by the University of Cambridge and its staff which are not included in this dataset.

Also we ended up listing the publications that were submitted to our system in 2014 because that was our starting point, rather than considering the payments from 2014 and working back. This might be an issue for the analysis – it will depend on which way people have interpreted ‘2014’. I should note that 74 (12.13%) of the invoices listed in this data were actually paid in January 2015.

Headline numbers

  • 610 funded articles were submitted in 2014 to our system for publication
  • 495 have been invoiced and paid as at March 2015
  • The amount spent on APCs (including VAT) for these invoices was £936,224.86
  • This gives an average cost per APC paid (including VAT if charged) of £1891.36
  • The range of APCs is from £94.61 for an article published by Magnolia Press, to £3,869.72 for an article published by Wiley

What does this mean?

It means we are spending a lot of (RCUK) money on APCs.  We also have supported payment of page and colour charges and have paid for researchers to join memberships that offer a discount for APCs out of the RCUK fund – neither of those categories of expenditure was captured in this data set.

The University is participating in the various Jisc Collections series of offsetting programs with publishers and we are discussing other ways of managing this expenditure. However, we really need to consider whether this is the way of the future.

Issues with reporting

Pulling the information together for this list revealed a few issues. First, while we agree with the collection of data to allow aggregation across the sector, for us to pull the required information together was challenging because we do not collect the information in this way.

However there are some indications this type of detail will be requested on a standard basis for reporting. Certainly Jisc suggesting this as a way forward. In their ‘APC data collection’ blog  they state:

HEIs will be able to benchmark their APC data. Using a standard template will help to produce comparable data between institutions which can be more easily aggregated. The data fields to be completed have been chosen from careful analysis of HEI needs. This means that the spreadsheet can be used for both internal reporting and also external reporting including to the Wellcome Trust for compliance monitoring of the Charity Open Access Fund, and potentially RCUK.

 So we therefore need to consider this information when designing new systems.

Issues with invoices

We have a considerable block of Purchase Orders that have not been invoiced. While there will always be a delay because of the length of time between acceptance and publication in some instances, some of these are very old.

The issue of items not being invoiced can partially be explained by the cancellation of Purchase Orders. In some cases the team has contacted the author and found that the email is bouncing because the author has moved to a different institution. In other cases the author decided not to go ahead with open access publication, so we have raised a Purchase Order against something that no longer exists.

Long standing Purchase Orders (over 14 months) are potentially a problem because it is money being held as committed funds. We are now adding the process of checking older  un-invoiced Purchase Orders to the ever-growing list of things to do in the workflow for ensuring compliance.

Published 26 March 2015
Written by Dr Danny Kingsley
Creative Commons License

Evolution of Library Ethnography Studies – notes from talk

Today Susan Gibbons, the University Librarian & Deputy Provost at Yale University came to speak to the Cambridge Library community about her work for more than a decade on the ethnography of library users.

The premise of the work Susan has done in collaboration with anthropologist Nancy Fried Foster is that libraries should get to know their users better rather than assume that ‘we know what they need’. They recognised that often we base our assumptions of what students experience and need on our own experiences as students which are clearly dated.

The reality of a rapidly changing world is that we need to adapt to the changing needs of the students. It is easier for the library to change than to expect our users to change.

I should note that I have followed Susan and Nancy Fried Foster’s work for many years. I based the design of my empirical work for my PhD “The effect of scholarly communication practices on open access: an Australian study of three disciplines” on their early studies. I am a bit of a fan.

Ethnographic research

Their work began on 2002 and 2003 with analyses of ‘faculty’ (academics’) work practices, before moving to undergraduates, graduate students, analysing how people search for information, asking what is the point of science library buildings in the electronic age, before most recently returning to the undergraduate student body.

There is a circularity in this type of  ethnographic research:

  • Start with a basic question – such as what are the barriers to completion of a PhD, or why do people still use the buildings in science when the content is all online
  • Methods – These can vary wildly from photos to observations, and analysis of academics diaries
  • Data Gathering – these first three constitute the ethnography
  • Findings – which is getting into user design. Analyse the data to start looking for patterns.
  • Change – you can only do this process once maybe twice, and if you do not make a change that is visible to the patrons then they won’t engage. There is a huge investment of time and trust in the process. This needs a commitment from the top down not just the bottom up to have the changes.

At the end of the process, there needs to be an analysis of whether the change has worked and if not, start again. Susan noted that they do not presume to know what the answers were when beginning the process. The area is so unknown they would not even be able to create a survey to ask A, B or C.

We need to embrace experimentation. Libraries have to have a tolerance for failure and change – otherwise the whole process will not work.  We need to have R&D thinking – try it and see if it works. We should not worry about every contingency because then we have lost the opportunity because someone has stepped in and started something. Don’t worry about scale until you find it is an appropriate method.

Changing role of the library

Once of the big messages from the talk was that increasingly the literature and the technology re intertwined. As librarians it is not just good enough to know the literature. If supporting social sciences then you need to know statistical packages and data visualisation. If the librarian can’t help with that then there is a break in the service. There needs to be a fluency with the digital tools as well in the library staff. Libraries are defined by their services but not by their collections. If we don’t have the services to support those collections then we may as well not have them.

This observation raised the question from the floor – do we employ specialists or employ librarians and train them up? Susan responded by saying increasingly Yale University was employing more people who have a PhD in the specific area. That way there is no expectation that everyone is an expert in every tool but there can be a go-to person.

However there has been a push to training people up in the library to be ‘librarians as teachers’. There is an emerging program to teach librarians on how to be teachers – very focused on professional development.

Increasingly the university is asking librarians to have outreach as part of their role. Professional assessments ask if they have created any subject guides, how many classes they have taught. Outreach is valued in the evaluation process. For some existing staff this was not comfortable – they wanted to be curators. The feeling for these people was they ‘changed the rules on me’ – so the university helps them make the transition. Some have come along the outreach path, others have moved somewhere else – and the university helps them with that move.

Studies at Rochester and Yale Universities

Susan did describe two examples  – one from her current institution Yale University and one from her former, the University of Rochester. She prefaced these descriptions with the disclaimer that all institutions are different – these findings are not translatable. That is why ethnography is so important – it is very locally focused. The questions that can affect the outcomes include whether students live on campus or off campus.Are students full or part time? Is it an engineering based institution. In addition weather and climate impacts behaviour.

The question behind the undergraduate student study was: “How can we improve the experience for the students?” That behind the graduate student study was “What are the barriers to the completion of a dissertation?”

Despite her findings being fascinating I won’t go into detail here partly because of the specificity of the findings to her institutions. Susan did note this was not a talk where we should take her group’s solutions and adopt them, rather we need to undertake our own research to determine what we need to do. However for those who are interested there is a considerable body of publications about the various projects:

However the methods are instructive and some of what emerged from the work is universal.

Findings

With the undergraduate students they undertook a ‘retrospective study’ of students who were responsible for writing a big paper. They ‘followed’ the students by contacting them one a week or fortnight and asking basic questions on how they were going. Then the day they handed the paper in they interviewed the students. This produced two outputs – a transcript of the interview but also during the interview the students were asked to draw the process out as they were explaining their experience.

The graduate students were asked to do an in-situ interview – in the place they do most of their work. Sometimes a lab, a coffee shop, where they lived. Then they were asked all sorts of information like showing us when they decided to print out an article and how did they keep track of them, about the books on the shelves – how they decide whether to buy one or take it our of the library and how are things filed on their computer. Other questions included the tricks of the trade of how they got through the writing process, what software they were using and the time of day and when they work.

Findings that spanned both studies included what Susan described as ‘the magical summer’. This is the time between finishing their bachelor degree and start the graduate school. The researchers asked professors what they thought the abilities were of those that leave as bachelors, and what the expectations were of new graduates, which was much higher. Hence the magic of the three month summer – how did the students manage this raised expectation? They knew they could not change those expectation of the professors. But we could help the students reach their expectations and help them through the process. This was a good time for the library to say ‘we are safe, work with us and we can help you’.

Another question is what is the best time to introduce research tools? People working on a thesis will need to use some sort of research tools – but when do you introduce that into their toolkit. The group found that once the student’s dissertation prospectus was approved their toolbox was locked – they were not able to ‘take a risk’ at that stage. So the introduction of tools has to be early in the process. The library had to ensure that professors were encouraging students to use the library services when they were writing their shorter papers.

They also found the importance of the human network – learning from other students, following other researchers – people texting information or attending a conference and seeing a book that is perfect for a colleague so they take a picture and send to their friend. So the question is how do you get the relevant librarian into that social network so they see the librarian as another resource?

Summary

Susan summarised the talk by saying there are so many things that drive the students away including physical and other barriers. Barriers can include use of acronyms, the scattering of collections and not knowing  how to get access to them. The Library needs to ask of itself is this barrier still necessary? If not change it. Some things are necessary but if we can’t change it we should explain it.

There is a continued importance of libraries as physical spaces – when the students want a place to go to get their work done they go to the library because it is a place of intellectual gravity. There are important symbolic aspects of libraries. They had asked students to circle a map of the university where you don’t feel welcome. Students answered ‘I’m not an athlete and that’s a gym’ for example. But the Library is a place where we all feel welcome. There is neutrality there.

Published 18 March 2015
Written by Dr Danny Kingsley
Creative Commons License

FORCE2015 observations & notes

This blog first appeared on the FORCE2015 website on the 14 January 2015

First a disclaimer. This blog is not an attempt to summarise everything that happened at FORCE2015 – I’ll leave that to others. The Twitter feed using #FORCE2015 contains an interesting side discussion, and the event was livestreamed with individual sessions live in two weeks here – so you can always check bits out for yourself.

So this is a blog about the things that I as a researcher in scholarly communication working in university administration (with a nod to my previous life as a science communicator) found interesting. This is a small representative of the whole.

This was my first FORCE event, which has occurred annually since the first event FORCE11 , which happened in August 2011 after a “Beyond the pdf” workshop in January that year. It was nice to have such a broad group of attendees. There were researchers and innovators (and often people were both), research funders, publishers, geeks and administrators all sharing their ideas. Interestingly there were only a few librarians – this, in itself makes this conference stand out. Sarah Thomas, Vice President of Harvard Library observed this, noting she is shocked that there are usually only librarians at the table at these sort of events.

To give an idea of the group – when the question was asked about who had received a grant from the various funding bodies, I was in a small minority by not putting up my hand. These are actively engaged researchers.

I am going to explore some of the themes of the conference here, including:

  • Library issues
  • The data challenge
  • New types of publishing
  • Wider scholarly communication issues, and
  • The impenetrability of scientific literature

Bigger questions about effecting change

Responsibility

Whose responsibility is it to effect change in the scholarly communication space? Funders say they are looking to the community for direction. Publishers are saying they are looking to authors and editors for direction. Authors are saying they are looking to find out what they are supposed to do. We are all responsible. Funding is not the domain of the funders, it is interdependent.

What is old is still old

The Journal Incubator team asked the editorial board members of the new journal “Culture of Community” to identify what they thought will attract people to their journal. None mentioned the modern and open technology of their publishing practices. All points they identified were traditional, such as: peer review, high indexing, pdf formatting etc. Take home message – Authors are not interested in the back-end technology of a journal, they just want the thing to work. This underlies the need to ENABLE not ENGAGE.

The way forward

The way forward is three fold, and incorporates: Community – Policy – Infrastructure. Moving forward we will require initiatives focused on: Sustainability, Collaboration and Training.

Library issues

Future library

Sarah Thomas, the Vice President of the Harvard Library spoke about “Libraries at Scale or Dinosaurs Disrupted”. She had some very interesting things to say about the role of the library into the future:

  • Traditional libraries are not sustainable. Acquisition, cataloguing and storage of publications doesn’t scale.
  • We need to operate at scale, and focus on this centuries’ challenges not last, by developing new priorities and reallocate resources to them. Use approaches that dramatically increase outputs.
  • There is very little outreach of the libraries into the community –  we are not engaging broadly expect “we are the experts and you come to us and we will tell you what to do”.
  • We must let go of our outdated systems – such as insufficiently automated practices, redundant effort, ‘just in case coverage’.
  • We must let go of our outdated practices – a competitive, proprietary approach. We need to engage collaborators to advance goals.
  • Open up hidden collection and maximise access to what we have.
  • Start doing research into what we have and illuminate the contents in ways we never could in a manual world, using visualization and digital tools

Future library workforce

There was also some discussion about the skils a future library worksforce needs to have:

  • We need an agile workforce – skills training, data science social media etc – help promote the knowledge of quality to work. Put it in performance goals.
  • We need to invest in 21st century skillsets. the workforce we should be hiring includes:
    • Metadata librarian
    • Online learning librarians
    • Bioinformatics librarians
    • GIS specialist
    • Visualization librarian
    • Copyright advisor
    • Senior data research specialist
    • Data curation experts
    • Scholarly communications librarian
    • Quantitative data specialist
    • Faculty technology specialist
    • Subject specialist
Possible solution?
The Council on LIbrary and Information Resources offers PostDoc Fellowships: CLIR Postdoctoral Fellows work on projects that forge and strengthen connections among library collections, educational technologies and current research. The program offers recent PhD graduates the chance to help develop research tools, resources, and services while exploring new career opportunities.

Possible opportunity to observe change?

In summing up the conference Phil Bourne said there is an upcoming major opportunity point – both the European Bioinformatics Institute in EU and the National Library of Medicine in US will soon assume new leadership. They are receiving recommendations on what the institution of the future should look like.

The library has a tradition of supporting the community, being an infrastructure to maintain knowledge, and in the case of National Library of Medicine to set policy. If they are going to reinvent this institution we need to watch what will it look like in the future.

The future library (or whatever it will be called) should curate, catalog, preserve and disseminate the complete digital research lifecycle. This is something we need to move towards. The fact that there is an institution that might move towards this is very exciting.

The data challenge

Data was discussed at many points during the conference, with some data solutions/innovations showcased:

  • Harvard has the Harvard Dataverse Network– a repository to share data. “Data Management at Harvard” – Harvard Guidelines and Policies cranking up investment in managing data LINK
  • The Resource Identification Initiative is designed to help researchers sufficiently cite the key resources used to produce the scientific findings reported in the biomedical literature.
  • Bio Caddie is trying to do for data what PubMed central has done for publications using a Data Discovery Index. The goal of this project is to engage a broad community of stakeholders in the development of a biomedical and healthCAre Data Discovery and Indexing Ecosystem (bioCADDIE).

The National Science Foundation data policy

Amy Frielander spoke about The Long View. She posed some questions:

  • Must managing data be collated with storing the data?
  • What gets access to what and when?
  • Who and what can I trust?
  • What do we store it in? Where do we put things, where do they need to be?

The NSF don’t make a policy for each institution, they make one NSF Data Sharing Policy that works more or less well across all disciplines. There is a diversity of sciences with heterogeneous research results. Range of institutions, professional societies, stewardship institutions and publishers, and multiple funding streams.

There are two contact points – when grant is awarded, and when they report. If we focus on publications we can develop the architecture to extend to other kinds of research products. Integrate the internal systems within the enterprise architecture to minimise burden on investigators and program staff.

Take home message: The great future utopia (my word) is: We want to upload once to use many times. We want an environment in which all publications are linked to the underlying evidence (data) analytical tools, and software.

New types of publishing

There were several publishing innovations showcased.

Journal Incubator

The University of Lethbridge has a ‘journal incubator’ which was developed with the goal of sustaining scholarly communication and open and digital access. It allows the incubator to train graduate students in the task of journal editorships so the journal can be provided completely free of charge.

F1000 Research Ltd – ‘living figures’

Research is an ongoing activity but the way we publish you wouldn’t think it was. It is still very much around the static print object. The F1000 LINK has the idea that data is embedded in the article – developed a tool that allows you to see what is on the article.

Many figures don’t need to exist – you need the underlying data. Living figures in the paper. Research labs can submit data directly on top of the figure – to see if it was reproducible or not. This provides interesting potential opportunities –bringing static reseach figures to life – a “Living collection” Can have articles in different labs around that data. The tools and technologies are out there already.

Collabra – giving back to the community

New University California Open Press journal, Collabra will share a proportion of APC with researchers and reviewers. Of the $875 APC, $250 goes into the fund. Editors and reviewers get money into the fund, and there is a payout to the research community – they can decide what to do with it. Choices are to:

  • Receive it electronically
  • Pay it forward to pay APCS in future
  • Pay it forward to institution’s OA fund.

This is a journal where reviewers get paid  – or can elect to pay themselves. See if everyone can benefit from the journal. No lock-in – benefit through partnerships.

Future publishing – a single XML file

Rather than replicating old publishing processes electronically, the dream is we have one single XML file in the cloud. There is role-based access to modify the work (by editors, reviewers etc) then at the end that version is the one that gets published. Everything is in the XML and then automatic conversion at the end.  References at the click of a button are completely structured XML – tags are coloured. Can track the changes. The journal editor gets a deep link to say something to look at. Can accept or reject. XML can convert to a pdf – high level typography, fully automatically.

Wider scholarly communication issues

This year is the 350th anniversary of the first scientific journal* Philosophical Transactions of the Royal Society. Oxford holds a couple of copies of this journal and there was an excursion for those interested in seeing it.

It is a good time to look at the future.

Does reproducibility matter?

Something that was widely discussed was the question of whether research should be reproducible,which raised the following points:

  • The idea of a single well defined scientific method and thus an incremental, cumulative, scientific process is debatable.
  • Reproducibility and robustness are slightly different. Robustness of the data may be key.
  • There are no standards with a computational result that can ensure we have comparable experiments.
Possible solution?

Later in the conference a new service that tries to address the diversity of existing lab software was showcased – Riffyn. It is a cloud based software platform – a CAD for experiments. The researcher has a unified experimental view of all their processes and their data. Researchers can update it themselves – not reliant on IT staff.

Credit where credit is due

I found the reproducibility discussion very interesting, as was the discussion about authorship and attribution which posed the following:

  • If it is an acknowledgement system everyone should be on it
  • Authorship is a proxy for scientific responsibility. We are using the wrong word.
  • When crediting research we don’t make distinctions between contributions. Citation is not the problem, contribution is.
  • Which building blocks of a research project do we not give credit for? And which ones only get indirect credit? How many skills would we expect one person to have?
  • The problem with software credit is we are not acknowledging the contributors, so we are breaking the reward mechanism
  • Of researchers in research-intensive universities, 92% are using software. Of those 69% say their work would be impossible without software. Yet 71% of researchers have no formal software development training. We need standard research computer training.
Possible solutions
  • The Open Science Framework  –  provides documentation for the whole research process. This therefore determines how credit should be apportioned.
  • Project CRediT has come up with a taxonomy of terms. Proposing take advantage of an infrastructure that already exists. Using Mozilla OpenBadges – if you hear or see the word ‘badges’ think ‘Digital Credit’

The impenetrability of scientific literature

Astrophysicist Chris Lintott discussed citizen science, specifically the phenomenally successful programGalaxyZoo which taps into a massive group of interested amateur astronomers to help classify galaxies in terms of their shape. This is something that humans do better than machines.

What was interesting was the challenge that Chris identified – amateur astronomers become ‘expert’ amateurs quickly and the system has built ways of them to communicate with each other and with scientists. The problem is that the astronomical literature is simply impenetrable to these (informed) citizens.

The scientific literature is the ‘threshold fear’ for the public. This raises some interesting questions about access – and the need for some form of moderator. One suggestion is some form of lay summary of the research paper – PLOS Medicine have an editor’s summary for papers. (Nature do this for some papers, and BMJ are pretty strong on this too).

Take home message – By designing a set of scholarly communication tools for citizen scientists we improve the communication for all scientists. This is an interesting way to think about how we want to browse scholarly papers as researchers ourselves.

*Yes I know that the French Journal des scavans was published before this, but it was boarder in focus, so hence the qualifier ‘first scientific journal”
Published 18 March 2015
Written by Dr Danny Kingsley
Creative Commons License