Category Archives: Uncategorized

The value of embracing unknown unknowns

This blog accompanies a talk Danny Kingsley gave to the RLUK Conference held at the British Library on 9-11 March 2016. The slides are available and the Twitter hashtag from the event was #rluk16

The talk centred around a debate piece written with my long standing collaborator, Dr Mary Anne Kennan, published in August 2015: Open Access: The Whipping Boy for Problems in Scholarly Publishing. This original 10,000 word article was the starting point for a debate where five people provided rebuttals to our position and we were then given the opportunity to write a rejoinder to these. All the articles were published together.

I have included a précis of the article below as Annex 1, but that is not what the talk was about – what I wanted to discuss was the unexpected progression of the piece and what that revealed to us as authors working in Scholarly Communication.

After we submitted the original piece we sent through several suggestions (including names and contact details) to the Editor for people who might want to contribute. These primarily included practitioners in the Open Access space:

  • Funders
  • Library staff
  • Research managers
  • Editors
  • Publishers
  • Policy makers

There was considerable difficulty in locating people who were prepared to contribute. We are still unsure why this was the case – it may have been a time issue, the fact that this was an academic publication and we were asking administrative professionals, or that it was potentially politically sensitive. On the Editor’s suggestion we sent some personal requests to contacts to ask them to participate. However, in the end four of the five people who wrote rebuttals were researchers in the Information Systems field.

This process made the whole production very protracted. There was a two-year period between the first approach from the journal and publication. The production process from the start of the writing period was 18 months – the actual dates are listed as Annex 2 below.

Same old, same old – the responses

Reading the rebuttals from the four Information Systems researchers, two things become obvious. First, none of them actually addressed the posits we had presented in our original debate piece – which, after all was the point of the exercise.

Second, a theme began to emerge, demonstrated by these snippets:

  • “Before discussing that in detail we need to know what the current situation is regarding OA publishing in IS”
  • “We now discuss four fundamental points regarding scholarly communication. We begin by asking what constitutes the main building blocks of the scholarly communication system”
  • “Before examining the current state of scholarly publication, let us set some parameters for this discussion”
  • “I think the argument would benefit from more systematically analyzing the current system of scholarly publishing…”

In each case the authors chose to undertake their own analysis of scholarly publishing – sometimes apparently unaware that this is a long established area of research.

So what does this tell us?

Lesson 1 – ‘Engagement’ is not working

One thing that was striking about this process was that each contributor came to their own conclusion that Open Access is something we should aim towards. While this is a ‘good thing’ for Open Access advocacy, it is not scalable. If we wait for every researcher to come to their own personal epiphany about Open Access we will never have high levels of uptake.

There has been a long standing belief and practice in Open Access that if the research community were only more aware of the issues in scholarly publishing then they would come on board with Open Access. I am entirely guilty of this myself. However after a decade of trying, it is fairly safe to say that engagement has not worked.

One conclusion to take away from this experience is we must enable the academic community to disseminate their work openly. It must happen around them.

Lesson 2 – The research area of scholarly communication is not well recognised

The concept of an academic discipline is fairly slippery, but it is reasonably safe to say that two things define a discipline – the scholarly literature and language.

Academic ‘communities’ manifest in the form of journals or learned societies. But Scholarly Communication research is traditionally discussed either in a disciplinary specific way in a disciplinary journal (such as part of an editorial), or are published in journals in the sociology of science, communication, librarianship or the information sciences disciplines.

There are two journals that do specifically look at Scholarly Communication – the Journal of Librarianship and Scholarly Communication and Scholarly and Research Communication. I should note that Publications also looks at many issues in this area too.

There are now Offices of Scholarly Communication in universities, especially in the US & increasingly in the UK – the Office of Scholarly Communication at Cambridge being a classic example. However there are no Faculties or Departments or Professorial Chairs of Scholarly Communication in existence – that I can find. I am happy to hear about them if they do exist.

And yet people do undertake research in this area. They publish articles, peer review each other’s work, present at conferences. This is academic work.

It might well be a problem of language. Michael Billing’s book ‘Learn to Write Badly: How to succeed in the social sciences’ makes the argument that creating a language that is impenetrable to others is a way of boundary stamping a discipline.

But in the area of Scholarly Communication, many of the words are vernacular – with common meanings that might be different to their specific meaning in the context of the research. A classic example is ‘publish’ which simply means ‘make public’, but within the academic context means that there has been a process of review and revision, branding and attribution. Words like ‘repository’ and ‘mandate’ have caused me some professional grief.

And we are having some trouble with terminology in the Open Access space with publishers. For example the conflation of ‘deposit’ with ‘make available’ – Wiley instructs authors that they cannot deposit until after the embargo. This is wrong. Authors can deposit whenever they like, as long as they don’t make it available until after the embargo. Green Open Access – which means making a copy of the work freely available – has been rather bizarrely interpreted by Elsevier in their Open Access pages as providing a link to the (subscription) article.

The reason there can be such a high level of inaccuracy around language is because it is not ‘officially’ defined anywhere. I should note that the Consortia Advancing Standards in Research Administration Information (CASRAI) may be doing some work in this area.

Problem 1 – Practice versus study

We concluded in our rebuttal that the practice of scholarly communication (as distinct from the study of it) is shared among all academic fields, librarians, publishers, and administrators. Each of these bring their own levels of understanding, perspectives, and involvement in the scholarly communication system.

This can create a problem because practitioners often think they have a good understanding of the issues surrounding the publication process. But according to a 2012 article in the Journal of Librarianship and Scholarly Communication researchers are generally held to have a low awareness of publishing issues and open access opportunities and are confused over copyright issues.

This is a case of the ‘Unknown Unknowns’ – a term coined (to much ridicule) by Donald Rumsfeld in 2003.

Regardless of where individuals sit, however, in all instances there needs to be a base level of competence in this area. Yes I know, I have just said we should not try and engage academics to convert them to Open Access. However what we should be doing is ensuring they have at least a basic understanding of this area for their own professional wellbeing.

One of the conclusions of my 2008 PhD The effect of scholarly communication practices on engagement with open access: An Australian study of three disciplineswhere I undertook in-depth interviews with 43 researchers about their publication and communication practices – was that the Master/Apprentice system is broken (see pp177 – 188). We are not equipping our researchers with the information they need to navigate the publication process successfully. This need for education was echoed in a 2014 paper about open access journal quality indicators (itself published in the Journal of Librarianship and Scholarly Communication – notice a pattern?)

Problem – library community also needs to know

But this is not just an issue for the research community. Librarians in the academic space also need to know about these issues. Last year the Association of College and Research Libraries (ACRL) released their (excellent) Scholarly Communication Toolkit. The introductory pages note that the “ACRL sees a need to vigorously re-orient all facets of library services and operations to the evolving technologies and models that are affecting the scholarly communication process.” The reason, they say, is because in order for academic libraries to continue to succeed we need to integrate our work into all aspects of the full cycle of scholarly communication.

The toolkit also notes that there is ‘wide variance’ in the levels of understanding of these issues within our community. If we consider the ‘four stages of competence’ as a rough tool:

  1. Unconsciously unskilled – we don’t know that we don’t have this skill, or that we need to learn it.
  2. Consciously unskilled – we know that we don’t have this skill.
  3. Consciously skilled – we know that we have this skill.
  4. Unconsciously skilled – we don’t know that we have this skill (it just seems easy).

It would be an ideal situation to have our academic library community sitting at stages three and four. In reality many are at stage two and even at stage one.

But bringing everyone up to speed is a huge challenge. Our experiences in Australia have demonstrated it is extremely difficult to get issues related to scholarly communication into curricula for library training. Many of the skills in this area are learnt ‘on the job’.

There are almost no courses on repository management as demonstrated in this 2012 study published in the (here it is again) Journal of Librarianship and Scholarly Communication. There is a now slightly out of date list of courses in scholarly communication here. Professor Stephen Pinfield did point out after my talk that he is incorporating open access into his library courses. Discussions about open access are also included at Charles Sturt University in subjects where it is related such as Foundations for information Studies, Collections and Research Data Management, but there has been difficulty in securing a subject explicitly on Open Access or even more broadly on scholarly communication.

Even professional training is limited – CILIP offers ‘Institutional repositories and metadata’ and ‘Digital copyright’ but nothing on publishing or open access. One of the positive outcomes of the conference has been an offer to discuss some of these needs with CILIP.

Solution?

So what is the solution? We must shift from managing the academic literature to participating in the generation of it. Librarians can begin by engaging with the academic literature in their area. Suggestions include:

  • Reading research that is being published (in your area of librarianship)
  • Writing an academic article
  • Presenting work at conferences
  • Offering your services as a peer reviewer
  • Serving on an editorial board
  • Collaborating with your academic community on a project and writing about it

When I suggested this at the conference there was some push-back from the audience, defending the benefits of learning on the job. Afterwards, I was approached by a participant who said she had recently published a paper and found the process incredibly instructive. Interestingly, the same thing happened when a speaker urged colleagues to publish an academic paper at LIBER last year. There was again push-back from the audience until one participant said they seconded her statement. He said he thought he knew all about journals because he worked with them but when he published something he realised ‘I didn’t really know anything about it’.

We might have some way to go.

Annex 1 – The original debate piece

In the original debate piece we provided a background to OA’s development and current state – we did not go into great detail because we were limited by the 10,000 word count and we had made some assumptions about prior knowledge.

The piece examined some of the accusations leveled against OA and described why they were false and indeed indicative of a wider set of problems with scholarly communication:

  • that OA publishers are predatory,
  • that OA is too expensive,
  • that self-depositing papers in OA repositories will bring about the end of scholarly publishing.

We then proposed discussions we considered we should be having about scholarly publishing to take advantage of social and technological innovations and move it into the 21st century. These were the monograph issue, management of APCs, improving institutional repositories, needing to make scholarly publishing inclusive and the reward system.

Annex 2 – The times involved in publication

Here are the dates involved in getting the full debate piece to ‘print’:

  • First approach from the journal – September 2013
  • Agreed to write the piece and first discussion – 10 February 2014
  • Submitted the first argument – 26 May 2014
  • Submitted amendment based on editor’s comments – 29 May 2014
  • Rebuttals sent to us – 18 November 2014
  • Deadline for rejoinder – 19 December 2014
  • Rejoinder sent (!) – 16 February 2015
  • “Publication is with the production editor and will be out ‘anytime’” email – 6 May 2015
  • Copy editor’s questions sent to us – 4 June 2015
  • Corrected pieces (original & rejoinder) sent to editors – 26 June 2015
  • Date of acceptance – 4 July 2015
  • Date of publication – 17 August 2015

Published 11 March 2016
Written by Dr Danny Kingsley
Creative Commons License

Forget compliance. Consider the bigger RDM picture

The Office of Scholarly Communication sent Dr Marta Teperek, our Research Data Facility Manager to the  International Digital Curation Conference held in in Amsterdam on 22-25 February 2016. This is her report from the event.

Fantastic! This was my first IDCC meeting and already I can’t wait for next year. There was not only amazing content in high quality workshops and conference papers, but also a great opportunity to network with data professionals from across the globe. And it was so refreshing to set aside our UK problem of compliance with data sharing policies, to instead really focus on the bigger picture: why it is so important to manage and share research data and how to do it best.

Three useful workshops

The first day started really intensely – the plan was for one full day or two half-day workshops, but I managed to squeeze in three workshops in one day.

Context is key when it comes to data sharing

The morning workshop was entitled “A Context-driven Approach to Data Curation for Reuse” by Ixchel Faniel (OCLC), Elizabeth Yakel (University of Michigan), Kathleen Fear (University of Rochester) and Eric Kansa (Open Context). We were split into small groups and asked to decide what was the most important information about datasets from the re-user’s point of view. Would the re-user care about the objects themselves? Would s/he want to get hints about how to use the data?

We all had difficulties in arranging the necessary information in order of usefulness. Subsequently, we were asked to re-order the information according to the importance from the point of view of repository managers. And the take-home message was that for all of the groups the information about datasets required by the re-user was the not same as that required from the repository.

In addition, the presenters provided discipline-specific context based on interviews with researchers – depending on the research discipline, different information about datasets was considered the most important. For example, for zoologists, the information about specimen was very important, but it was of negligible importance to social scientists. So context is crucial for the collection of appropriate metadata information. Insufficient contextual information makes data not useful.

So what can institutional repositories do to address these issues? If research carried out within a given institution only covers certain disciplines, then institutional repositories could relatively easily contextualise metadata information being collected and presented for discovery. However, repositories hosting research from many different disciplines will find this much more difficult to address. For example, Cambridge repository has to host research spanning across particle physics, engineering, economics, archaeology, zoology, clinical medicine and many, many others. This makes it much more difficult (if not impossible) to contextualise the metadata.

It is not surprising that information most important from the repository’s point of view is different that the most important information required by the data re-users. In order to ensure that research data can be effectively shared and preserved in long term, repositories need to collect certain amount of administrative metadata: who deposited the data, what are the file formats, what are the data access conditions etc. However, repositories should collect as much administrative metadata as possible in an automated way. For example, if the user logs in to deposit data, all the relevant information about the user should be automatically harvested by feeds from human resources systems.

EUDAT – Pan-European infrastructure for research data

The next workshop was about EUDAT – the collaborative Pan-European infrastructure providing research data services, training and consultancy for researchers. EUDAT is an impressive project funded by Horizon2020 grant and it offers five different types of services to researchers:

  • B2DROP – a secure and trusted data exchange service to keep research data synchronized, up-to-date and easy to exchange with other researchers;
  • B2SHARE – service for storing and sharing small-scale research data from diverse contexts;
  • B2SAFE – service to safely store research data by replicating it and depositing at multiple trusted repositories (additional data backups);
  • B2STAGE – service to transfer datasets between EUDAT storage resources and high-performance computing (HPC) workspaces;
  • B2FIND – discovery service harvesting metadata from research data collections from EUDAT data centres and other repositories.

The project has a wide range of services on offer and is currently looking for institutions to pilot these services with. I personally think these are services which (if successfully implemented) would be of a great value to Pan-European research community.

However, I have two reservations about the project:

  • Researchers are being encouraged to use EUDAT’s platforms to collaborate on their research projects and to share their research data. However, the funding for the project runs out in 2018. EUDAT team is now investigating options to ensure the sustainability and future funding for the project, but what will happen to researchers’ data if the funding is not secured?
  • Perhaps if the funding is limited it would be more useful to focus the offering on the most useful services, which are not provided elsewhere. For example, another EC-funded project, Zenodo, already offers a user-friendly repository for research data; Open Science Framework offers a platform for collaboration and easy exchange of research data. Perhaps EUDAT could initially focus on developing services which are not provided elsewhere. For example, having a Pan-Europe service harvesting metadata from various data repositories and enabling data discovery is clearly much needed and would be extremely useful to have.

Jisc Shared RDM Services for UK institutions

I then attended the second half of Jisc workshop on shared Research Data Management services for UK institutions. The University of York and the University of Cambridge are two of 13 pilot institutions participating in the pilot. Jenny Mitcham from York and I gave presentations on our institutional perspectives on the pilot project: where we are at the moment and what are our key expectations from the pilot. Jenny gave an overview of an impressive work by her and her colleagues on addressing data preservation gaps at the University of York. Data preservation was one of the areas in which Cambridge hopes to get help from the Jisc RDM shared services project. Additionally, as we described before, Cambridge would greatly benefit from solutions for big data and for personal/sensitive data. My presentation from the session is available here.

Presentations were followed by breakout group discussions. Participants were asked to identify the areas of priorities for the Jisc RDM pilot. The top priority identified by all the groups seemed to be solutions for personal/sensitive data and for effective data access management. This was very interesting to me as at similar workshops held by Jisc in the UK, breakout groups prioritised interoperability with their existing institutional systems and cost-effectiveness. This could be one of the unforeseen effects of strict funders’ research data policies in the UK, which required institutions to provide local repositories to share research data.

As a result of these policies, many institutions were tasked with creating institutional data repositories from scratch in a very short time. Most of the UK universities now have institutional repositories which allow research data to be uploaded and shared. However, very few universities have their repositories well integrated with other institutional systems. Not having the policy pressure in non-UK countries perhaps allowed institutions to think more strategically about developing their RDM service provisions and ensure that developed services are well embedded within the existing institutional infrastructure.

Conference papers and posters

The two following days were full of excellent talks. My main problem was which sessions to attend: talking with other attendees I am aware that the papers presented at parallel sessions were also extremely useful. If the budget allows, I certainly think that it would be useful for more participants from each institution to attend the meeting to cover more parallel sessions.

Below are my main reflections from keynote talks.

Barend Mons – Open Science as a Social Machine

This was a truly inspirational talk, raising a lot of thought-provoking discussions. Barend started from a reflection that more and more brilliant brains, with more and more powerful computers and with billions of smartphones, created a single, interconnected social super-machine. This machine generates data – vast amount of data – which is difficult to comprehend and work with, unless proper tools are used.

Barend mentioned that with the current speed of new knowledge being generated and papers being published, it is simply impossible for human brains to assimilate the constantly expanding amount of new knowledge. Brilliant brains need powerful computers to process the growing amount of information. But in order for science to be accessible to computers, we need to move away from pdfs. Our research needs to be machine-readable. And perhaps if publishers do not want to support machine-readability, we need to move away from the current publishing model.

Barend also stressed that if data is to be useful and correctly interpretable, it needs to be accessible not only to machines, but also to humans, and that effort is needed to make data well described. Barend said that research data without proper metadata description is useless (if not harmful). And how to make research data meaningful? Barend proposed a very compelling solution: no more research grants should be awarded without 5% of money dedicated for data stewardship.

I could not agree more with everything that Barend said. I hope that research funders will also support Barend’s statement.

Andrew Sallans – nudging people to improve their RDM practice

Andrew started his talk from a reflection that in order to improve our researchers’ RDM practice we need to do better than talking about compliance and about making data open. How a researcher is supposed to make data accessible, if the data was not properly managed in the first place? The Open Science Framework has been created with three mission statements:

  • Technology to enable change;
  • Training to enact change;
  • Incentives to embrace change.

So what is the Open Science Framework (OSF)? It is an open source platform to support researchers during the entire research lifecycle: from the start of the project, through data creation, editing and sharing with collaborators and concluding with data publication. What I find the most compelling about the OSF is that is allows one to easily connect various storage platforms and places where researchers collaborate on their data in one place: researchers can easily plug their resources stored on Dropbox, Googledrive, GitHub and many others.

To incentivise behavioural change among researchers, the OSF team came up with two other initiatives:

Personally, I couldn’t agree more with Andrew that enabling good data management practice should be the starting point. We can’t expect researchers to share their research data if we have not helped them with providing tools and support for good data management. However, I am not so sure about the idea of cash rewards.

In the end researchers become researchers because they want to share the outcomes of their research with the community. This is the principle behind academic research – the only way of moving ideas forward is to exchange findings with colleagues. Do researchers need to be paid extra to do the right thing? I personally do not think so and I believe that whoever decides to pursue an academic career is prepared to share. And it is our task to make data management and sharing as easy as possible, and the use of OSF will certainly be of a great aid for the community.

Susan Halford – the challenge of big data and social research

The last keynote was from Susan Halford. Susan’s talk was again very inspirational and thought-provoking. She talked about the growing excitement around big data and how trendy it has become; almost being perceived as a solution to every problem. However, Susan also pointed out the problems with big data. Simply increasing the computational power and not fully comprehending the questions and the methodology used can lead to serious misinterpretations of results. Susan concluded that when doing big data research one has to be extremely careful about choosing proper methodology for data analysis, reflecting on both the type of data being collected, as well as (inter)disciplinary norms.

Again – I could not agree more. Asking the right question and choosing the right methodology are key to make the right conclusions. But are these problems new to big data research? I personally think that we are all quite familiar with these challenges. Questions about the right experimental design and the right methodology have been known to humankind since scientific method is used.

Researchers always needed to design studies carefully before commencing to do the experiments: what will be the methodology, what are the necessary controls, what should be the sample size, what needs to happen for the study to be conclusive? To me this is not a problem of big data, to me this is a problem that needs to be addressed by every researcher from the very start of the project, regardless of the amount of data the project generates or analyses.

Birds of a Feather discussions

I had not experienced Birds of a Feather Discussions (BoF) before at a conference and I am absolutely amazed by the idea. Before the conference started the attendees were invited to propose ideas for discussions keeping in mind that BoF sessions might have the following scope:

  • Bringing together a niche community of interest;
  • Exploring an idea for a project, a standard, a piece of software, a book, an event or anything similar.

I proposed a session about sharing of personal/sensitive data. Luckily, the topic was selected for a discussion and I co-chaired the discussion together with Fiona Nielsen from Repositive. We both thought that the discussion was great and our blog post from the session is available here.

And again, I was very sorry to be the only attendee from Cambridge at the conference. There were four parallel discussions and since I was chairing one of them, I was unable to take part in the others. I would have liked to be able to participate in discussions on ‘Data visualisation’ and ‘Metadata Schemas’ as well.

Workshops: Appraisal, Quality Assurance and Risk Assessment

The last day was again devoted to workshops. I attended an excellent workshop from the Pericles project on the appraisal, quality assurance and risk assessment in research data management. The project was about how an institutional repository should conduct data audits when accepting data deposits and also how to measure the risks of datasets becoming obsolete.

These are extremely difficult questions and due to their complexity, very difficult to address. Still, the project leaders realised the importance of addressing them systematically and ideally in an (semi)automated way by using specialised software to help repository managers making the right preservation decisions.

In a way I felt sorry for the presenters – their project progress and ambitions were so high that probably none of us, attendees, were able to critically contribute to the project – we were all deeply impressed by the high level of questions asked, but our own experience with data preservation and policy automation was nowhere at the level demonstrated by the workshop leaders.

My take home message from the workshop is that proper audit of ingested data is of crucial importance. Even if there is no automation of risk assessment possible, repository managers should at least collect information about files being deposited to be able to assess the likelihood of their obsolescence in the future. Or at least to be able to identify key file formats/software types as selected preservation targets to ensure that the key datasets do not become obsolete. For me the workshop was a real highlight of the conference.

Networking and the positive energy

Lots of useful workshops, plenty of thought-provoking talks. But for me one of the most important parts of the conference was meeting with great colleagues and having fascinating discussions about data management practices. I never thought I could spend an evening (night?) with people who would be willing to talk about research data without the slightest sights of boredom. And the most joyful and refreshing part of the conference was that due to the fact we were from across the globe, our discussions diverted away from the compliance aspect of data policies. Free from policy, we were able to address issues of how to best support research data management: how to best help researchers, what are our priority needs, what data managers should do first with our limited resources.

I am looking forward to catching up next year with all the colleagues I have met in Amsterdam and to see what progress we will have all made with our projects and what should be our collective next moves.

Summarising, I came back with lots of new ideas and full of energy and good attitude – ready to advocate for the bigger picture and the greater good. I came back exhausted, but I cannot imagine spending four days any more productively and fruitfully than at IDCC.

Thanks so much to the organisers and to all the participants!

Published 8 March 2016
Written by Dr Marta Teperek

Creative Commons License

Promoting Open Access in a department – what works

At Cambridge University, the Open Access team offers a centralised service to help our researchers make their work open access and comply with their funder requirements. But getting researchers to visit www.openaccess.cam.ac.uk and engage with the service is proving to be a challenge. We estimate that only around a third of the University’s journal articles are currently being uploaded within the three-month window allowed by HEFCE.

We’re working hard to publicise the message at our end, but centralised services can’t reach all academics in the same way as their departments and colleges can. If we’re to ensure that as much of the University’s output as possible is available Open Access and eligible for the next REF, some of that work has to happen in departments.

Success story

One of the most successful departments in the University is the MRC Epidemiology Unit, which currently submits more than 80% of its manuscripts on time. We went to talk to Signe Wulund, the administrator there who looks after open access, about what she does and the systems she uses.

Workflows

Click on the thumbnail below to open a high resolution version of the ‘MRC Epidemiology & CEDAR Open Access Process’.

MRC Epidemiology poster 1a

At the heart of her workflow is a detailed knowledge of what the department’s 120-130 researchers are publishing. Authors are encouraged to inform her of any articles accepted for publication and to send her their manuscripts. Frequent reminders in the form of posters, newsletter items and emails make sure they don’t forget.

Papers can be uploaded to www.openaccess.cam.ac.uk by either the academics themselves or by an administrator on their behalf. Since 2013 MRC Epidemiology has had a great deal of success with either Signe or her colleague Karen handling manuscript uploads rather than the authors themselves. The expertise they have developed in the policies and workflows makes the process run extremely smoothly. They also check that the version of the article they’ve been sent is the correct one and that funders have been correctly acknowledged. This all means that by the time we received the manuscript, it’s exactly what we need and we can get back to them with advice and information on any payments as quickly as possible.

Click on the thumbnail below to see a high resolution version of ‘Open Access Process Flowchart – who does what?’

MRC Epidemiology poster 2a

Added benefits

The most valuable aspect to this approach, however, is that it allows Signe to keep centralised records of the department’s publishing output. She maintains a spreadsheet that tracks all the Unit’s known papers, including where they are in the publication process and their open access status. This includes both papers authors have directly notified her about and those which she has found later through other sources like Symplectic.

This has uses well beyond Open Access, but also enables Signe to maintain an organised overview of the department’s output and to chase up any issues that might arise; it also allows the department to follow up with journals and post manuscripts eligible for green Open Access to Europe PubMed Central.

Open Access is strongly backed by the department’s leadership and made part of regular research group leader meetings, with papers included and discussed about open access performance. This maintains high awareness among researchers and allows group leaders to remind or inform colleagues who are not taking the appropriate action.

This is the key advantage that departmental administrators have over a centralised service – the fact that they are a regular part of department life and can reach researchers more directly and more often than the Office of Scholarly Communication can, however many events or presentations we hold.

There are, of course, resource implications. We know that many administrative staff within the University are overstretched. However, the time demands of the work Signe does on open access are not extravagant, and well worth the modest investment.

Take home messages

So the key things that the MRC Epidemiology Unit do that other departments could try to improve their open access rates are:

  • Consistent administrators with responsibility for open access, working on it regularly and so able to develop expertise.
  • Engage with researchers to keep track of departmental publications.
  • Administrators upload articles to Open Access website to increase efficiency.
  • Strong support from departmental leadership.
  • Frequent reminders and publicity about open access, using a variety of means.
  • Open access made a regular part of PI meetings, which can be used to increase engagement with open access.

The impact such measures can have speaks for itself. The MRC Epidemiology Unit’s submission and compliance rates are more than double the University average. But the key thing to note is that such work also needn’t be especially burdensome from a time or resource standpoint. Of course, different departments have different organisational structures, publishing patterns and needs, but many of these approaches are common sense and applicable anywhere.

If you’d like more detailed advice or suggestions for how to promote open access in your own department, please get in touch with us at info@openaccess.cam.ac.uk.

Published 7 March 2016
Written by Dr Philip Boyes

Creative Commons License

 

Is CC-BY really a problem or are we boxing shadows?

Comments from researchers and colleagues have indicated some disquiet about the Creative Commons (CC-BY) licence in some areas of the academic community. However, in conversation with some legal people and contemporaries at other institutions (some of these exchanges are replicated at the end of the blog) one of the observations was that generally academics are not necessarily cognizant with what the licences offer and indeed what protections are available under regular copyright.

To try and determine whether this was an education and advocacy problem or if there are real issues we had a roundtable discussion on 29 February at Cambridge University attended by about 35 people who were a mixture of academics, administrators, publishers and legal practitioners. The discussion centred on some of the objections raised in the information circulated before the meeting (which is summarised at the end of this blog). For ease of description each objection is addressed in turn.

Background

Creative Commons provide a series of licences that people who create work can add to their work which tell users what they can or cannot do with it. There are a range of licenses that run from no restrictions at all CC-0 to fairly restrictive CC-BY-NC-ND-SA* where the user must attribute the author, not amend the work, cannot make any financial gain from it and must put the same licence on anything they produce using this work.

There are increasing requirements from funders such as the Wellcome Trust and RCUK in the UK that any work published open access must have a Creative Commons Attribution (CC-BY) licence attached to it. The rationale behind this is that research needs to be available for other researchers to both read and reuse, but also to text and data mine without fear of copyright breaches. Work that is available under a CC-BY licence can be easily incorporated into course reading lists without copyright complications.

* Note added 8 March – a comment has been sent through is that the CC-BY-NC-ND-SA is impossible to apply because the share-alike and no derivatives clauses are mutually exclusive and cannot be applied together. See this explanation.

Summary of the discussion

The general feeling in the discussions was that academics do want to share their work but they don’t want things to be used incorrectly. The outcome of the discussion was that while there are some confusions in this area, and we could do some work on advocacy and educational materials there are also some specific cases where CC-BY has the potential to cause issues.  In a small number of cases issues have actually occurred.

Is CC-BY a problem? For whom?

We should note here that CC-BY only affects a proportion of research published in the UK. While all research is potentially affected by the HEFCE requirement to make work available, the route preferred is through placing a copy in a repository. So this discussion affects only those researchers who have a specific grant from the Charities Open Access Fund (Wellcome Trust) or the RCUK. Humanities researchers tend not to hold grants, and for those that do, it is their articles, not their monographs that are affected by this requirement.

While there are some actual concrete examples of issues for researchers in the Arts and Humanities, many of the problems discussed here are what could happen. There was a comment from a scientific publisher that the sciences also had some concerns about CC-BY when it was first introduced, but none of the concerns have actually come to fruition. Another person noted there have been hundreds of thousands of pieces of content published under CC-BY licences, with very few known problem cases or harm. This is telling. The question was raised: Are we just repeating myths?

On the other hand, just because issues haven’t happened yet does not mean that it would not be a serious problem should they did occur. One of the questions at the end of the discussion was: “Are the ethical norms of society strong enough to stop these concerns happening?” It would appear that to date they have been in the sciences.

Moral rights

CC-BY is an attribution licence. This means the moral right for the originator of the work to be identified is retained. However the moral right for the integrity of the research is not protected. The discussion centred around this.

If someone uses work under a CC-BY licence and makes alterations to it, they do need to indicate they have changed a work but not how they have altered it. The concern in the group was that the work could be altered so the meaning is entirely changed and it would still be attributed to the original author.

Authors can object to the derogatory treatment of their work. The recourse of being able to ask to have the originator’s name taken off the work was not seen as satisfactory because then the person who has adapted the work is potentially able to publish the work, which is based substantially on someone else’s work, as their own.

That said, one comment was that academic works are always open to interpretation, whether quoted or not and whether available under a CC-BY licence or not.

Translation

The area of translations does appear to have some concrete examples of problems caused by CC-BY for Humanities & Social Science authors. One of the issues is it is very difficult to check a translation unless the original author can read the language into which their work has been translated.

Plagiarism

Of all of the areas of discussion, plagiarism raised the most opinions. The accusation that CC-BY somehow ‘encourages’ plagiarism is often levelled. Some arguments are that making work available under a Creative Commons licence protect authors against plagiarism rather than encourage it. Works available in the public domain are far more easily identified as the original work than something published on paper and held on a library shelf, for example.

There was a debate about what actually constitutes plagiarism. One opinion was that ‘It’s plagiarism unless it’s in quotes’. However while the use of quote marks would protect the integrity of the work, there is nothing legally wrong with a derivative use of a work that is available under CC-BY – legally this is not plagiarism.

Nothing about the CC-BY licence overrides UK law about fair dealing. One of the lawyers present noted that academics don’t understand the details of copyright. Academics want full protection but also full sharing. In the world of the internet there’s a free-for-all – people copy-and-paste from wherever they want. No-one respects licences, so an academic work is not necessarily protected under current rules.

It was noted that plagiarism occurs all the time, even when articles are all rights reserved and under traditional copyright. And while Open Access publishing does make plagiarism easier (regardless of the licence), it doesn’t change the underlying principle that it’s unethical. Ethical behaviour in academia sits separately from copyright law.

Sensitive information

The area of sensitive information seems to have the strongest case for not using a CC-BY licence. Researchers working in areas that might contain sensitive information – such as medical or criminal areas – spend a great deal of time ensuring that their findings are presented sensitively and ensuring their distribution is appropriate. The concern with CC-BY licences mean that these findings can be misconstrued which would be damaging to the researcher and could go back to the participants and affect them. If presented in the wrong way, altered research outputs could affect not just their research but also participants.

There is an issue about the dialogue between the people that are being studied and if they have any moral rights about how the information is being used.

An example that was given was in anthropology, working with a community of Native Americans in northern California, who released sensitive data and stories from their cultural past which they want to be accessed. However because they have been exploited in the past they wanted some form of restriction on how these things can be reused. This is an example where a CC-BY licence would not be appropriate.

An oral historian discussed the type of work they do with subjects talking about traumatic periods of their life. In these cases the researcher enters in a covenant with them about how their work can be used. This would not be able to be dealt with ethically under a CC-BY licence. The issue is about subsequent control over reuse of research, with concern about it being co-opted and used in another context.

The question about ethical use of material was raised again, with someone noting that no matter what licence it is available under you can’t control what people do with your work if they disagree with you.

Items containing third party copyright

Being required to publish work under a CC-BY licence does cause problems for people whose work contains a large amount of 3rd party material. This is because the burden on the author to obtain permissions for all of the works would be both time consuming and expensive. May researchers have raised questions about whether they can even do their work if they’re required to publish under CC-BY.

That said, if researchers are themselves using CC-BY works this issue is mitigated because they automatically have permission to use the material. This raises the question; does CC-BY make it more difficult or easier?

Commercialisation

There were some examples raised where a series of works that were freely available had been packaged up and sold. This raised the question: Who is being harmed in commercial exploitation of academic works?

Academics do not publish in journals for money, so the originator of a work that is subsequently sold on is not personally losing a revenue stream. There was a distinction between the academic and non-academic publishing environment. It was agreed that the person buying these works are being scammed. The concern is that people are being exploited by being made to pay for things that should be freely available.

The discussion moved to whether a Non Commercial licence would solve this problem. The issue here is the confusion over the definition of ‘commercial’ in this context. An institution that has a revenue stream from student fees could be seen to be commercial and therefore unable to include CC-BY-NC items on their reading lists.

It was noted that CC-BY–NC-ND is extremely restrictive about ways works can be used.

Academic freedom

The discussion several times touched on the broader issue of the government putting an increasing number of requirements against researchers. The questions raised were: “Does someone who is fronting up with the money have the rights to enforce a particular licence? What about the subjects of a study?”

There is supposed to be arms length between funders and universities but a concern is that funding bodies want to have more power to tell academics what to work on.

Next steps

In summary, the discussion indicated that CC-BY licences do not encourage plagiarism, or issues with commercialism within academia (although there is a broader ethical issue). However in some cases CC-BY licences could pose problems for the moral integrity of the work and cause issues with translations. CC-BY licenses do create challenges for works containing sensitive information and for works containing third party copyright.

There is an expectation amongst the academic community that people behave ethically and within cultural norms.

As agreed with the group we have published this blog post which summarises the discussions held this week. In discussions about the Open Access Policy Framework for the University it would be helpful to include a statement that there is concern about CC-BY licences for some disciplines and types of research.

Background information sent to participants prior to the discussion

Commentary on CC-BY in published reports

The issue of the CC-BY licenses was a recurrent theme in A review of the RCUK review of implementation of its OA policy (March 2015). Many arts, humanities and social science disciplines hold ‘principled and practical objections to the use of CC-BY licences’ (p18). This is partly because work under a CC-BY license ‘could be both used commercially in ways of which the author does not approve and also might not be properly acknowledged as their work’ (pp19-20).

The Royal Historical Society evidence to the RCUK review noted that humanities scholars have particular objections to certain kinds of ‘derivative use’ that amount to the encouragement of plagiarism. Because the ‘attribution’ requirement in CC BY is very loose, it is possible for a reuser of a humanities article to alter it and reissue it under their own name, specifying only that it is an adaptation of the original, but without specifying how it has been adapted. In this way reusers may adopt the style, argument and ‘personality’ of the original work under their own name (and even copyright it). This represents a violation of the specific moral right of the author to the integrity of the work, and the only recourse offered to the author by CC BY is to have their name removed from the attribution (which makes the violation worse). This kind of re-use is as likely to degrade as to enhance the public benefit of the research.

The British Academy’s response to the Commons Select Committee (2013) noted that many articles in HSS subjects are the product of single-author scholarship, where there is more of a claim on ‘moral rights’ that are not adequately protected under an unrestricted CC-BY licence. There were also concerns about commercial reuse of work that contains third party copyright, involving complicated permissions. The response suggests that it should be possible to vary Creative Commons licences according to the usages and requirements of different subject areas – and that an ‘Attribution-NonCommercial-NoDerivs’ licence (CC-BY-NC-ND) may very often be more appropriate

Notes on an April 2013 Royal Historic Society position changing workshop on CC-BY and Humanities (chaired by Peter Mandler) noted that the editors of a number of history journals have suggested that the CC-BY licence facilitates and promotes commercial re-use and uses akin to plagiarism; that the licence therefore amounts to an infringement of authors’ moral and intellectual property rights; and that it is likely to damage the quality of education.

The HistoryUK Submission to the 2013 Business, Innovation and Skills Committee Enquiry on Open Access Publishing raised issues about the loss of protection of intellectual property, the dangers associated with allowing derivative works in sensitive areas of research, and the possible increased costs or embargos publishers may feel compensate for the transfer of a commercial asset to a third party.

Comments from researchers and administrators

In preparation for the round table, Danny Kingsley asked her community across the sector what kinds of objections different people in an administrative or library role had heard from researchers. These are summarised below.

English researcher at Cambridge – “I would prefer not to make my work, produced with the benefit of public funding, available in a form that would allow others to exploit it commercially, as the simple CC-BY licence does. My preference would be for the CC BY-NC-SA licence.”

Research Information Specialist – One question to ask here is whether traditional publishing models – such as signing over copyright itself – are really more beneficial to authors, and of course to weigh the risk of a negative CC experience against the benefits of positive ones.

Concerns raised in discussion with academics in the Humanities (reflected in two responses)

  1. A belief that CC BY encourages plagiarism
  2. That content licenced under CC BY is not monitored for copyright and other infringement to the same extent as more restrictive licences (a misguided belief that publishers actively monitor use and reuse of content I think)
  3. I have also heard the more vague concern about ideas being manipulated or twisted in some way and then re-published under the author’s name
  4. That encouraging reuse, especially derivatives, means the author has no control over what people do with the information (and therefore are associated with something that they would rather not be)

Advice provided on Creative Commons and licensing

Published 3 March 2016
Written by Dr Danny Kingsley, with thanks to Dr Philip Boyes and Dr Joyce Heckman for their notes.

Creative Commons License

 

Sharing personal/sensitive research data

Sharing research data comes with many ethical and legal issues. Since these issues are often complex and can rarely be solved with one size fits all solutions, they tend not to be addressed as topics of conferences and workshops. We therefore thought that gathering of data curation professionals at IDCC 16 would be an excellent opportunity to start these discussions.

This blog post is our informal report from a Birds of a Feather discussion on sharing of personal/sensitive research data which took place at the International Digital Curation Conference in Amsterdam “Visible data, invisible infrastructure” on 23 February 2016.

The need for good models for sharing personal/sensitive data

Many funders and experts in data curation agree that sharing personal and sensitive data needs to be planned from the start of research project in order to be successful. Whenever it is possible to anonymise research data, this is the advised procedure to be followed before data is shared. For data which cannot be anonymised, governance procedures for data access need to be established.

We were interested to find out what are the practical solutions around sharing of personal/sensitive data offered by data curators and data managers who came to the meeting. To our surprise, only two data curators admitted to provide solutions for hosting of personal/sensitive data. Among these two, one repository accepted only anonymised data. The rest were currently not making personal/sensitive data available via their repositories.

Why is sharing personal/sensitive data so difficult to manage? Three main issues were discussed: anonymisation difficulty, problems with providing managed access to research data and technical issues.

Anonymisation difficulty

There was a lot of discussion about data anonymisation. When anonymising data one has to consider both direct and indirect identifiers. One of the data curators present at the meeting explained that their repository would accept anonymised data providing that they had no direct identifiers and maximum three indirect identifiers. But sometimes even a small number of indirect identifiers can make participants identifiable, especially in combination with information available in the public domain.

So perhaps instead of talking about data anonymisation one should rather focus on estimating the risk of re-identification of participants. It would be useful for the community if tools to perform risk assessment of participant re-identification in anonymised datasets were available to provide data curators with means to objectively assess and evaluate these risks.

Problems with managed access to research data

If repositories accept sensitive/personal research data they need to have robust workflows for managing access requests. The Expert Advisory Group on Data Access (EAGDA) has produced a comprehensive guidance document on governance of data access. However, there are difficulties in putting this guidance into practice.

If a request for data access is received by a repository, the request will be forwarded to a person nominated by the research team to handle data requests. However, research data are usually expected to be preserved long-term (5 years plus) and such long term periods are often longer than the time researchers spend at their institutions. This creates a problem: who will be there to respond to data access requests? One of the institutions accepting sensitive/personal data has a workflow in which the initial request is forwarded to the nominated person. If the nominated person is no longer available, the request is then directed to the faculty’s head. However, this also creates problems:

  • Contact details for the nominated person need to be kept up to date and researchers leaving the post might not remember to notify the repository managers.
  • The faculty’s head might be too busy to respond to requests and might have insufficient knowledge about the data to be able to manage access requests effectively.

Technical issues and workflows if things go wrong

There are also technical issues associated with sharing of personal/sensitive research data. One of the institutions reported that due to a technical fault in the repository system, restricted research data was released as open access data and downloaded by several users (who did not sign the data access agreement) before the fault has been noticed.

Follow up discussions led to a reflection that a repository can never be 100% sure of security of personal/sensitive data. Even assuming that technical faults will not happen, repositories can be also subject to hacking attacks. Therefore, when accepting personal/sensitive data for long term preservation, repository managers should also assess risks of data being inappropriately released and decide on a suitable risk mitigation strategy. Additionally, institutions should have workflows in place with procedures to be followed shall things go wrong and restricted data is inappropriately released.

Other issues

Apart from the topics mentioned above we discussed other issues related to sharing personal/sensitive research data. For example:

  • What workflows do organisations have in place to check that data depositors have the rights to share confidential research data or data generated in collaboration with other third parties (external collaborators, external funding bodies, commercial partners)?
  • How do we properly balance the amount of checks required to validate that the data depositor has the rights to share and not discourage data depositors from sharing their research via a repository?
  • Or, if research data cannot be safely shared via a repository, do organisations offer the possibility of creating a metadata-only records to facilitate data discoverability?
  • What are the implications for DOI creation?

Actions

Our discussions revealed that there are clearly more questions than answers available on how to effectively share personal/sensitive data. Therefore it is important that we, as the community of practitioners, start developing workflows and procedures to address these problems.

SciDataCon 2016 (11-13 September 2016) is organising a call for session proposals (deadline: 7 March) and we would like to propose a session on sharing of personal/sensitive data. If you have any practice papers that you would like to propose for this session please fill in a google form here. Please note that the google form is to submit your proposals for the session to us (it is not an official submission form for the conference). We will use your proposed practice papers to form a session proposal for the conference.

Possible topics for practice papers for the session:

  • What are the workflows for sharing commercial and sensitive data via repositories?
  • How is your organisation trying to balance between protection of confidential data and encouragement for sharing?
  • What safety mechanisms are there in place at your organisation to safeguard confidential data shared via your repository?
  • What are the workflows and procedures in place in case confidential/restricted/embargoed data is accidentally released?
  • What are adhered to ensure that data depositors have the rights to share confidential research data or data generated in collaboration with other third parties (external collaborators, external funding bodies, commercial partners)?
  • How do organisations balance the amount of checks required to validate that the data depositor has the rights to share and not to discourage data depositors from sharing their research via a repository?
  • Other case studies/practice papers on the subject

Resources:

Published 29 February 2016
Written by Fiona Nielsen, CEO at DNAdigest and Repositive and Marta Teperek, Research Data Facility Manager at the University of Cambridge
Creative Commons License

 

‘It is all a bit of a mess’ – observations from Researcher to Reader conference

“It is all a bit of a mess. It used to be simple. Now it is complicated.” This was the conclusion of Mark Carden, the coordinator of the Researcher to Reader conference after two days of discussion, debate and workshops about scholarly publication..

The conference bills itself as: ‘The premier forum for discussion of the international scholarly content supply chain – bringing knowledge from the Researcher to the Reader.’ It was unusual because it mixed ‘tribes’ who usually go to separate conferences. Publishers made up 47% of the group, Libraries were next with 17%, Technology 14%, Distributors were 9% and there were a small number of academics and others.

In addition to talks and panel discussions there were workshop groups that used the format of smaller groups that met three times and were asked to come up with proposals. In order to keep this blog to a manageable length it does not include the discussions from the workshops.

The talks were filmed and will be available. There was also a very active Twitter discussion at #R2RConf.  This blog is my attempt to summarise the points that emerged from the conference.

Suggestions, ideas and salient points that came up

  • Journals are dead – the publishing future is the platform
  • Journals are not dead – but we don’t need issues any more as they are entirely redundant in an online environment
  • Publishing in a journal benefits the author not the reader
  • Dissemination is no longer the value added offered by publishers. Anyone can have a blog. The value-add is branding
  • The drivers for choosing research areas are what has been recently published, not what is needed by society
  • All research is generated from what was published the year before – and we can prove it
  • Why don’t we disaggregate the APC model and charge for sections of the service separately?
  • You need to provide good service to the free users if you want to build a premium product
  • The most valuable commodity as an editor is your reviewer time
  • Peer review is inconsistent and systematically biased.
  • The greater the novelty of the work the greater likelihood it is to have a negative review
  • Poor academic writing is rewarded

Life After the Death of Science Journals – How the article is the future of scholarly communication

Vitek Tracz, the Chairman of the Science Navigation Group which produces the F1000Research series of publishing platforms was the keynote speaker. He argued that we are coming to the end of journals. One of the issues with journals is that the essence of journals is selection. The referee system is secret – the editors won’t usually tell the author who the referee is because the referee is working for the editor not the author. The main task of peer review is to accept or reject the work – there may be some idea to improve the paper. But that decision is not taken by the referees, but by the editor who has the Impact Factor to consider.

This system allows for information to be published that should not be published – eventually all publications will find somewhere to publish. Even in high level journals many papers cannot be replicated. A survey by PubMed found there was no correlation between impact factor and likelihood of an abstract being looked at on PubMed.

Readers can now get papers they want by themselves and create their own collections that interest them. But authors need journals because IF is so deeply embedded. Placement in a prestigious journal doesn’t increase readership, but it does increase likelihood of getting tenure. So authors need journals, readers don’t.

Vitek noted F1000Research “are not publishers – because we do not own any titles and don’t want to”. Instead they offer tools and services. It is not publishing in the traditional sense because there is no decision to publish or not publish something – that process is completely driven by authors. He predicted this will be the future of science publishing will shift from journals to services (there will be more tools & publishing directly on funder platforms).

In response to a question about impact factor and author motivation change, Vitek said “the only way of stopping impact factors as a thing is to bring the end of journals”. This aligns with the conclusions in a paper I co-authored some years ago. ‘The publishing imperative: the pervasive influence of publication metrics’

Author Behaviours

Vicky Williams, the CEO of research communications company Research Media discussed “Maximising the visibility and impact of research” and talked abut the need to translate complex ideas in research into understandable language.

She noted that the public does want to engage with research. A large percentage of public want to know about research while it is happening. However they see communication about research is poor. There is low trust in science journalism.

Vicki noted the different funding drivers – now funding is very heavily distributed. Research institutions have to look at alternative funding options. Now we have students as consumers – they are mobile and create demand. Traditional content formats are being challenged.

As a result institutions are needing to compete for talent. They need to build relationships with industry – and promotion is a way of achieving that. Most universities have a strong emphasis on outreach and engagement.

This means we need a different language, different tone and a different medium. However academic outputs are written for other academics. Most research is impenetrable for other audiences. This has long been a bugbear of mine (see ‘Express yourself scientists, speaking plainly isn’t beneath you’).

Vicki outlined some steps to showcase research – having a communications plan, network with colleagues, create a lay summary, use visual aids, engage. She argued that this acts as a research CV.

Rick Anderson, the Associate Dean of the University of Utah talked about the Deeply Weird Ecosystem of publishing. Rick noted that publication is deeply weird, with many different players – authors (send papers out), publishers (send out publications), readers (demand subscriptions), libraries (subscribe or cancel). All players send signals out into the school communications ecosystem, when we send signals out we get partial and distorted signals back.

An example is that publishers set prices without knowing the value of the content. The content they control is unique – there are no substitutable products.

He also noted there is a growing provenance of funding with strings. Now funders are imposing conditions on how you want to publish it not just the narrative of the research but the underlying data. In addition the institution you work for might have rules about how to publish in particular ways.

Rick urged authors answer the question ‘what is my main reason for publishing’ – not for writing. In reality it is primarily to have high impact publishing. By choosing to publish in a particular journal an author is casting a vote for their future. ‘Who has power over my future – do they care about where I publish? I should take notice of that’. He said that ‘If publish with Elsevier I turn control over to them, publishing in PLOS turns control over to the world’.

Rick mentioned some journal selection tools. JANE is a system (oriented to biological sciences) where authors can plug in abstract to a search box and it analyses the language and comes up with suggested list of journals. The Committee on Publication Ethics (COPE) member list provides a ‘white list’ of publishers. Journal Guide helps researchers select an appropriate journal for publication.

A tweet noted that “Librarians and researchers are overwhelmed by the range of tools available – we need a curator to help pick out the best”.

Peer review

Alice Ellingham who is Director of Editorial Office Ltd which runs online journal editorial services for publishers and societies discussed ‘Why peer review can never be free (even if your paper is perfect)’. Alice discussed the different processes associated with securing and chasing peer review.

She said the unseen cost of peer review is communication, when they are providing assistance to all participants. She estimated that per submission it takes about 45-50 minutes per paper to manage the peer review. 

Editorial Office tasks include looking for scope of a paper, the submission policy, checking ethics, checking declarations like competing interests and funding requests. Then they organise the review, assist the editors to make a decision, do the copy editing and technical editing.

Alice used an animal analogy – the cheetah representing the speed of peer review that authors would like to see, but a tortoise represented what they experience. This was very interesting given the Nature news piece that was published on 10 February “Does it take too long to publish research?

Will Frass is a Research Executive at Taylor & Francis and discussed the findings of a T&F study “Peer review in 2015 – A global view”. This is a substantial report and I won’t be able to do his talk justice here, there is some information about the report here, and a news report about it here.

One of the comments that struck me was that researchers in the sciences are generally more comfortable with single blind review than in the humanities. Will noted that because there are small niches in STM, double blind often becomes single blind anyway as they all know each other.

A question from the floor was that reviewers spend eight hours on a paper and their time is more important than publishers’. The question was asking what publishers can do to support peer review? While this was not really answered on the floor* it did cause a bit of a flurry on Twitter with a discussion about whether the time spent is indeed five hours or eight hours – quoting different studies.

*As a general observation, given that half of the participants at the conference were publishers, they were very underrepresented in the comment and discussion. This included the numerous times when a query or challenge was put out to the publishers in the room. As someone who works collaboratively and openly, this was somewhat frustrating.

The Sociology of Research

Professor James Evans, who is a sociologist looking at the science of science at the University of Chicago spoke about How research scientists actually behave as individuals and in groups.

His work focuses on the idea of using data from the publication process that tell rich stories into the process of science. James spoke about some recent research results relating to the reading and writing of science including peer reviews and the publication of science, research and rewarding science.

James compared the effect of writing styles to see what is effective in terms of reward (citations). He pitted ‘clarity’ – using few words and sentences, the present tense, and maintaining the message on point against ‘promotion’ – where the author claims novelty, uses superlatives and active words.

The research found writing with clarity is associated with fewer citations and writing in promotional style is associated with greater citations. So redundancy and length of clauses and mixed metaphors end up enhancing a paper’s search ability. This harks back to the conversation about poor academic writing the day before – bad writing is rewarded.

Scientists write to influence reviewers and editors in the process. Scientists strategically understand the class of people who will review their work and know they will be flattered when they see their own research. They use strategic citation practices.

James noted that even though peer review is the gold standard for evaluating the scientific record. In terms of determining the importance or significance of scientific works his research shows peer review is inconsistent and systematically biased. The greater the reviewer distance results in more positive reviews. This is possibly because if a person is reviewing work close to their speciality, they can see all the criticism. The greater the novelty of the work the greater likelihood it is to have a negative review. It is possible to ‘game’ this by driving the peer review panels. James expressed his dislike of the institution of suggesting reviewers. These provide more positive, influential and worse reviews (according to the editors).

Scientists understand the novelty bias so they downplay the new elements to the old elements. James discussed Thomas Kuhn’s concept of the ‘essential tension’ between the classes of ‘career considerations’ – which result in job security, publication, tenure (following the crowd) and ‘fame’ – which results in Nature papers, and hopefully a Nobel Prize.

This is a challenge because the optimal question for science becomes a problem for the optimal question for a scientific career. We are sacrificing pursuing a diffuse range of research areas for hubs of research areas because of the career issue.

The centre of the research cycle is publication rather than the ‘problems in the world’ that need addressing. Publications bear the seeds of discovery and represent how science as a system thinks. Data from the publication process can be used to tune, critique and reimagine that process.

James demonstrated his research that clearly shows that research today is driven by last year’s publications. Literally. The work takes a given paper and extracts the authors, the diseases, the chemicals etc and then uses a ‘random walk’ program. The result ends up predicting 95% of the combinations of authors and diseases and chemicals in the following year.

However scientists think they are getting their ideas, the actual origin is traceable in the literature. This means that research directions are not driven by global or local health needs for example.

Panel: Show me the Money

I sat on this panel discussion about ‘The financial implications of open access for researchers, intermediaries and readers’ which made it challenging to take notes (!) but two things that struck me in the discussions were:

Rick Andersen suggested that when people talk about ‘percentages’ in terms of research budgets they don’t want you to think about the absolute number, noting that 1% of Wellcome Trust research budget is $7 million and 1% of the NIH research budget is $350 million.

Toby Green, the Head of Publishing for the OECD put out a challenge to the publishers in the audience. He noted that airlines have split up the cost of travel into different components (you pay for food or luggage etc, or can choose not to), and suggested that publishers split APCs to pay for different aspects of the service they offer and allow people to choose different elements. The OECD has moved to a Freemium model where that the payment comes from a small number of premium users – that funds the free side.

As – rather depressingly – is common in these kinds of discussions, the general feeling was that open access is all about compliance and is too expensive. While I am on the record as saying that the way the UK is approaching open access is not financially sustainable, I do tire of the ‘open access is code for compliance’ conversation. This is one of the unexpected consequences of the current UK open access policy landscape. I was forced to yet again remind the group that open access is not about compliance, it is about providing public access to publicly funded research so people who are not in well resourced institutions can also see this research.

Research in Institutions

Graham Stone, the Information Resources Manager, University of Huddersfield talked about work he has done on the life cycle of open access for publishers, researchers and libraries. His slides are available.

Graham discussed how to get open access to work to our advantage, saying we need to get it embedded. OAWAL is trying to get librarians who have had nothing to do with OA into OA.

Graham talked the group through the UK Open Access Life Cycle which maps the research lifecycle for librarians and repository managers, research managers, fo authors (who think magic happens) and publishers.

My talk was titled ‘Getting an Octopus into a String Bag’. This discussed the complexity of communicating with the research community across a higher education institution. The slides are available.

The talk discussed the complex policy landscape, the tribal nature of the academic community, the complexity of the structure in Cambridge and then looked at some of the ways we are trying to reach out to our community.

While there was nothing really new from my perspective – it is well known in research management circles that communicating with the research community – as an independent and autonomous group – is challenging. This is of course further complicated by the structure of Cambridge. But in preliminary discussions about the conference, Mark Carden, the conference organiser, assured me that this would be news to the large number of publishers and others who are not in a higher education institution in the audience.

Summary: What does everybody want?

Mark Carden summarised the conference by talking about the different things different stakeholder in the publishing game want.

Researchers/Authors – mostly they want to be left alone to get on with their research. They want to get promoted and get tenure. They don’t want to follow rules.

Readers – want content to be free or cheap (or really expensive as long as something else is paying). Authors (who are readers) do care about the journals being cancelled if it is one they are published in. They want a nice clear easy interface because they are accessing research on different publisher’s webpages. They don’t think about ‘you get what you pay for.’

Institutions – don’t want to be in trouble with the regulators, want to look good in league tables, don’t want to get into arguments with faculty, don’t want to spend any money on this stuff.

Libraries – Hark back to the good old days. They wanted manageable journal subscriptions, wanted free stuff, expensive subscriptions that justified ERM. Now libraries are reaching out for new roles and asking should we be publishers, or taking over the Office of Research, or a repository or managing APCs?

Politicians – want free public access to publicly funded research. They love free stuff to give away (especially other people’s free stuff).

Funders – want to be confusing, want to be bossy or directive. They want to mandate the output medium and mandate copyright rules. They want possibly to become publishers. Mark noted there are some state controlled issues here.

Publishers – “want to give huge piles of cash to their shareholders and want to be evil” (a joke). Want to keep their business model – there is a conservatism in there. They like to be able to pay their staff. Publishers would like to realise their brand value, attract paying subscribers, and go on doing most of the things they do. They want to avoid Freemium. Publishers could be a platform or a mega journal. They should focus on articles and forget about issues and embrace continuous publishing. They need to manage versioning.

Reviewers – apparently want to do less copy editing, but this is a lot of what they do. Reviewers are conflicted. They want openness and anonymity, slick processes and flexibility, fast turnaround and lax timetables. Mark noted that while reviewers want credit or points or money or something, you would need to pay peer reviewers a lot for it to be worthwhile.

Conference organisers – want the debate to continue. They need publishers and suppliers to stay in business.

Published 18 February 2016
Written by Dr Danny Kingsley
Creative Commons License

In conversation with Wellcome Trust and CRUK

On Friday 22 January Cambridge University invited our two main charity funders to discuss their views on data management and sharing with Cambridge researchers. David Carr from the Wellcome Trust and Jamie Enoch from Cancer Research UK came to the University to talk to our researchers.

The related blog ‘Charities’ perspective on research data management and sharing‘ summarises the presentations Jamie and David gave. After this event, a group of researchers from the School of Biological Sciences and from the School of Clinical Medicine at the University of Cambridge were invited to ask questions about the Wellcome Trust data management and sharing policy and CRUK data sharing and preservation policy directly of David and Jamie.

This blog is a summary of the discussion, with questions thematically grouped. These questions will be added to the list of Frequently Asked Questions on the University’s Research Data Management Website.

In summary:

  • It is not recommended that researchers simply share a link and release the data when requested. Research data should be available, accessible and discoverable.
  • The first responsibility is to protect the study participants. The funders provide guidance documents on sharing of patient data. Ethics committees also provide advice and guidance on what data can be shared. In principle, patient data should be safeguarded, but this should not preclude sharing. There are models for managed access to data that allow personal/sensitive data to be shared for legitimate purposes in a safe and secure manner.
  • The funders do not want to prevent new collaborations. When sharing data they recommend data generators provide a statement in the description of the data that they are willing to collaborate
  • It is recognised that it is often appropriate for researchers to have a defined period of exclusive access to the data they generate, but this should be determined by disciplinary norms. Any exemptions or delays have to be justified on a case by case basis, ideally at the outset of the project.
  • The funders expect research data that supports publications to be made accessible and publications should have a clear statement explaining how to access the underlying research data.
  • However researchers need to decide what is useful to be shared considering the effort of preparing the data for deposit and of sharing the data. If nobody is going to use the data, sharing is not a good use of researcher’s time.
  • Discipline-specific data repositories, where these exist, are recommended preferentially over general purpose or institutional repositories
  • Biosharing is an excellent resource with references to discipline-specific metadata schemas.
  • Staff members whose role is to manage data is an eligible cost on a grant
  • There are no funds for sharing data from old projects, although there are exceptions on a case by case basis
  • The funders are considering monitoring data management plans but their current primary goal is to encourage people to think about data management and sharing from the very start of the project

Access to research data

Q: Are funders benefiting from the expertise of organisations such as UK Data Service when providing advice on data access? UK Data Service has been managing controlled access to research data for a long time and it would be advantageous to benefit from their expertise.

A: Yes, we are in discussion with the UK Data Service. We are also working with the UK Data Service to consider whether it might be appropriate for hosting data from other disciplines beyond social science. We also believe there is significant scope to share lessons and best practices for data sharing between the social and biomedical sciences.

Q: Could we just share research data only when asked for it?

A: This is not a recommended solution: research data should be available, accessible and discoverable. Data access controls and criteria for what needs to happen for the access to be granted have to be made clear in metadata description.

Q: I have patient data which has to be stored in a secure space. I always say in my data management plan that I cannot share my data. I would like to get ethical guidance which will explain to me how to share these data. It is very easy to say that data cannot be shared. I would like to share my data, but I would like to do it properly. With patient data it is extremely difficult, especially with genomics data, where there is a risk that patients can be identified.

A: Sharing of clinical data is not easy. Both Wellcome Trust and Cancer Research UK are helping to drive a great deal of work which is considering access and governance models through which sensitive patient data can be made available for research in a safe, secure and trusted manner. They provide guidance documents on sharing of patient data. Safety of patients and patients’ data is important. Ethics committees also provide advice and guidance on what data can be shared.

Q: What about sharing of physical materials? I have received a request to share a culture derived from a patient material, but the Ethics Committee did not approve sharing of this material. What shall I do?

A (Peter Hedges, Head of Research Office): If your ethical approval says that you cannot share that material, you cannot share it. Your first responsibility is to protect your study participants.

Q: If I share my data via a repository and people can simply download my data, I can no longer collaborate with them to work on the data and I have lost the possibility of getting credit for my data.

A: Nobody wants to prevent new collaborations from happening. A solution might be to add a statement that you are willing to collaborate in the description of your data. Your data requestor might be interested in collaborating, simply because you know your data the best. Funders also expect that the data re-used by others is appropriately acknowledged/cited, and they want to ensure that due credit results from the secondary use of data.

Quality control of research data

Q: If researchers start sharing unpublished research data via data repositories there is a risk that these data will not be of good quality as they will not be peer-reviewed.

A: Authors of unpublished data can simply state in the data description that the item was not peer-reviewed. If applicable, funders also encourage reciprocal links between publications and supporting research data.

What data needs to be shared and when?

Q: If researchers start to share everything there will be a lot of useless data available in data repositories. How to prevent a flood of useless data on the internet?

A: We would like researchers to decide what data is useful to be shared. If nobody is likely to use the data, sharing is not a good use of researcher’s time. Repositories also need to make decisions over what is worth keeping over time.

Comment (Peter Hedges, Head of Research Office): The Research Council UK focuses on research data supporting publications and this is what we recommend to researchers: share research data which underpins publications.

Q: Are we expected to share large datasets resulting from bigger projects (databases, long-term datasets) or data supporting individual publications?

A: We expect research data that supports individual publications to be made available with a hyperlink to the data. We also want researchers to consider and plan more broadly how they can make data assets of value resulting from our funded research available to others in a timely and appropriate manner.

Q: What about images? Is it useful to share them? It involves a lot of time to organise images. Besides, a single confocal picture with multiple layers is 1GB. In theory it is possible to share all raw data and all raw images, but who would want to look at them? 10 figures of 10 images is already 100 GB of data. Where would I store all these images, who is going to use these data and how am I going to pay for this?

A: The effort of preparing the data for deposit and of sharing the data should be proportionate to the potential benefits of data sharing. Researchers need to decide what is useful to be shared, following disciplinary best practices and norms (recognising that disciplines are in very different places in terms of defining these).

Q: Is there a set amount of time for exclusive use of research data?

A: Researchers should adhere to disciplinary norms. For example, in genomics research data is frequently shared before publication (sometimes under a publication moratorium which protects the data generator’s right to first publication). Any exemptions or delays have to be justified on a case by case basis.

Comment (Peter Hedges, Head of Research Office): Research is competitive. Sometimes it might be useful for researchers to know who wants to get the access to data and what do they need them for.

Cost of data sharing

Q: Can I ask in my grant for a staff member to help me with data management?

A: Yes, this is an eligible cost on grant applications: you can request a salary to support a research data manager for your research project, as long as it is justified.

Q: According to CRUK policy, costs for data sharing can be budgeted in grant applications only from August 2015. What about research data from older projects, when these costs were not eligible in grant applications? Is there any transition fund available to pay for this?

A: Unfortunately, there are no additional funds to pay for these costs. Researchers who have older datasets that might be of significant value to the community should contact CRUK – all requests for support will be considered on a case by case basis.

Q: Wellcome Trust encourages data sharing and data re-use, but does not allow for costs of long-term data preservation to be budgeted in grant applications. This does not make sense to me.

A: We are still reviewing our policy on costs of data management and sharing and we might be revisiting this issue – however, it is problematic for us to consider estimated costs for preservation that extend before the life-time of the grant. Our understanding is that costs of long-term data preservation are often less significant than costs of initial data ingestion by the repository (and we will cover ingestion costs).

Q: Who is then going to pay for the long-term data storage?

A: Wellcome Trust funds some discipline-specific repositories, but this is done jointly with other funders. We support bigger undertakings and we are also working with partners to develop platforms for data sharing and discoverability in some priority areas (notably clinical trials). Cancer Research UK pays for some long-term storage options, if these are justified for particular needs of the project. These decisions are made on a case by case basis, depending on how the costs are justified and whether these are directly related to the scientific value of the project.

Metadata standards

Q: At the moment there are many general purpose and institutional repositories, which are not well structured. To support efficient re-use of data it is important to use structured data repositories and adhere to metadata standards. What are funders’ opinions about this?

A: Wherever possible, discipline-specific data repositories should be used preferentially over general purpose or institutional repositories. Adherence to discipline-specific metadata standards is also encouraged. It has to be acknowledged that development of well-structured data repositories is very resource-intensive and not all disciplines have good quality repositories to support them. For example, it took over 30 years to adapt unified metadata standards at Cambridge Crystallographic Data Centre. The time need to properly solve problems should never be underestimated.

Q: Are funders planning to provide researchers with a list of recommended schemas for metadata?

A: Biosharing is an excellent resource with references to discipline-specific metadata schemas. It is a useful suggestion to include a reference to Biosharing on our website.

Policy implementation

Q: Are you planning to monitor researchers’ adherence to data management plans? For example, the BBSRC does not have the manpower to check all data management plans manually, but they are planning to create a system to check if data has been uploaded automatically.

A: We are considering this. At the moment we require data management plans with the primary goal to encourage people to think about data management and sharing from the very start of the project.

Published 5 February 2016
Written by Dr Marta Teperek, verified by David Carr and Jamie Enoch
Creative Commons License

Charities’ perspective on research data management and sharing

In 2015 the Cambridge Research Data Team organised several discussions between funders and researchers. In May 2015 we hosted Ben Ryan from EPSRC, which was followed by a discussion with Michael Ball from BBSRC in August. Now we have invited our two main charity funders to discuss their views on data management and sharing with Cambridge researchers.

David Carr from the Wellcome Trust and Jamie Enoch from Cancer Research UK (CRUK) met with our academics on Friday 22 January at the Gurdon Institute. The Gurdon Institute was founded jointly by the Wellcome Trust and CRUK to promote research in the areas of developmental biology and cancer biology, and to foster a collaborative environment for independent research groups with diverse but complementary interests.

This blog summarises the presentations and discusses the data sharing expectations from Wellcome Trust and CRUK. A second related blog ‘In conversation with Wellcome Trust and CRUK‘ summarises the question and answer session that was held with a group of researchers on the same day.

Wellcome Trust’s requirements for data management and sharing

Sharing research data is key for Wellcome’s goal of improving health

David Carr started his presentation explaining that the Wellcome Trust’s mission is to support research with the goal of improving health. Therefore, the Trust is committed to ensuring research outputs (including research data) can be accessed and used in ways that will maximise health and societal benefits. David reminded the audience of benefits of data sharing. Data which is shared has the potential to:

  • Enable validity and reproducibility of research findings to be assessed
  • Increase the visibility and use of research findings
  • Enable research outputs to be used to answer new questions
  • Reduce duplication and waste
  • Enable access to data to other key communities – public, policymakers, healthcare professionals etc.

Data sharing goes mainstream

David gave on overview of data sharing expectations from various angles. He started by referring to the Royal Society’s report from 2012: Science as an open enterprise, which sets sharing as the standard for doing science. He then also mentioned other initiatives like the G8 Science Ministers’ statement, the joint report from the Academy of Medical Sciences, BBSRC, MRC and Wellcome Trust on reproducibility and reliability of biomedical research and the UK Concordat on Open Research Data with a take-home message that sharing data and other research outputs is increasingly becoming a global expectation, and a core element of good research practice.

Wellcome Trust’s policy for open data

The next aspect of David’s presentation was Wellcome Trust’s policy on data management and sharing. The policy was first published almost a decade ago (2007) with subsequent modifications in 2010. The principle of the policy is simple: research data should be shared and preserved in a manner which maximises its value to advance research and improve health. Wellcome Trust also requires data management plans as a compulsory part of grant applications, where the proposed research is likely to generate a dataset that will have significant value to researchers and other users. This is to ensure that researchers understand the importance of data management and sharing and to plan for it from the start their projects.

Cost of data sharing

Planning for data management and sharing involves costing for these activities in the grant proposal. The Wellcome Trust’s FAQ guidance on data sharing policy says that: “The Trust considers that timely and appropriate data management and sharing should represent an integral component of the research process. Applicants may therefore include any costs associated with their proposed approach as part of their proposal.” David then outlined the types of costs that can be included in grant applications (including for dedicated staff, hardware and software, and data access costs). He noted that in the current draft guidance on costing for data management estimated costs for long-term preservation that extend beyond the lifetime of the grant are not eligible, although costs associated with the deposition of data in recognised data repositories can be requested.

Key priorities and emerging areas in data management and sharing

Infrastructure

The Wellcome Trust also identified key priorities and emerging areas where work needs to be done to better support of data management and sharing. The first one was to provide resources and platforms for data sharing and access. David pointed out that wherever available, discipline-specific data repositories are the best home for research data, as they provide rich metadata standards, community curation and better discoverability of datasets.

However, the sustainability of discipline-specific repositories is sometimes uncertain. Discipline-specific resources are often perceived as ‘free’. However, research data submitted to ‘free’ data repositories has to be stored somewhere and the amount of data produced and shared is growing exponentially – someone has to pay for the cost of storage and long-term curation in discipline-specific data repositories. An additional point for consideration is that many disciplines do not have their own repositories and therefore need to heavily rely on institutional support.

Access

Wellcome Trust funds a large number of projects in clinical areas. Dealing with patient data requires careful ethical considerations and planning from the very start of the project to ensure that data can be successfully shared at the end of the project. To support researchers in dealing with patient data The Expert Advisory Group on Data Access (a cross-funder advisory body established by MRC, ESRC, Cancer Research UK and the Wellcome Trust) has developed guidance documents and practice papers about handling of sensitive data: how to ask for informed consent, how to anonymise data and the procedures that need to be in place when granting access to data. David stressed that balance needs to be struck between maximising the use of data and the need to safeguard research participants.

Incentives for sharing

Finally, if sharing is to become the normal thing to do, researchers need incentives to do so. Wellcome Trust is keen to work with others to ensure that researchers who generate and share datasets of value receive appropriate recognition for their efforts. A recent report from the Expert Advisory Group on Data Access proposed several recommendations to incentivise data sharing, with specific roles for funders, research leaders, institutions and publishers. Additionally, in order to promote data re-use, the Wellcome Trust joined forces with the National Institutes of Health and the Howard Hughes Medical Institute and launched the Open Science Prize competition to encourage prototyping and development of services, tools or platforms that enable open content.

Cancer Research UK’s views on data sharing

The next talk was by Jamie Enoch from Cancer Research UK. Jamie started by saying that because Cancer Research UK (CRUK) is a charity funded by the public, it needs to ensure it makes the most of its funded research: sharing research data is elemental to this. Making the most of the data generated through CRUK grants could help accelerate progress towards the charity’s aim in its research strategy, to see three quarters of people surviving cancer by 2034. Jamie explained that his post – Research Funding Manager (Data) – has been created as a reflection of data sharing being increasingly important for CRUK.

The policy

Jamie started talking about the key principles of CRUK data sharing policy by presenting the main issues around research data sharing and explaining the CRUK’s position in relation to them:

  • What needs to be shared? All research data, including unpublished data, source code, databases etc, if it is feasible and safe to do so. CRUK is especially keen to ensure that data underpinning publications is made available for sharing.
  • Metadata: Researchers should adhere to community standards/minimum information guidelines where these exist.
  • Discoverability: Groups should be proactive in communicating the contents of their datasets and showcasing the data available for sharing

Jamie explained that CRUK really wants to increase the discoverability of data. For example, clinical trials units should ideally provide information on their websites about the data they generate and clear information about how it can be accessed.

  • Modes of sharing: Via community or generalist repositories, under the auspices of the PI or a combination of methods

Jamie explained that not all data can be/should be made openly available. Due to ethical considerations sometimes access to data will have to be restricted. Jamie explained that as long as restrictions are justified, it is entirely appropriate to use them. However, if access to data is restricted, the conditions on which access will be granted should be considered at the project outset, and these conditions will have to be clearly outlined in metadata descriptions to ensure fair governance of access.

  • Timeframes: Limited period of exclusive use permitted where justified

Jamie suggested adhering to community standards when thinking about any periods of exclusive use of generated research data. In some communities research data is made accessible at the time of publication. Other communities will expect data release at the time of generation (especially in collaborative genomics projects). Jamie further explained that particularly in cases where new data can affect policy development, it is key that research data is released as soon as possible.

  • Preservation: Data to be retained for at least 5 years after grant end
  • Acknowledgement: Secondary users of data should credit original researcher and CRUK
  • Costs: Appropriately justified costs can be included in grant proposals

As of late 2015, financial support for data management and sharing can be requested as a running cost in grant applications. Jamie explained that there are no particular guidelines in place explaining eligible and non-eligible costs and that the most important aspect is whether the costs are well justified or not, and reasonable in the context of the research envisaged.

Jamie stressed that the key point of the CRUK policy is to facilitate data sharing and to engage with the research community, recognising the challenges of data sharing for different projects and the need to work through these collaboratively, rather than enforce the policy in a top-down fashion.

Policy implementation

Subsequently, the presentation discussed ways in which CRUK policy is implemented. Jamie explained that the main tool for the policy implementation is the new requirement for data management plans as compulsory part of grant applications.

Two of the three main response mode committees: Science Committee and Clinical Research Committee have a two-step process of writing a data management plan. During the grant application stage researchers need to write a short, free-form description about how they plan to adhere to CRUK’s policy on data sharing. Only if the grant is accepted, the beneficiary will be asked to write a more detailed data management plan, in consultation with CRUK representatives.

This approach serves two purposes as it:

  • ensures that all applicants are aware of CRUK’s expectations on data sharing (they all need to write a short paragraph about data sharing)
  • saves researchers’ time: only those applicants who were successful will have to provide a detailed data management plan, and it allows the CRUK office to engage with successful applicants on data sharing challenges and opportunities

In contrast, applicants for the other main CRUK response mode committee, the Population Research Committee, all fill out a detailed data management and sharing plan at application stage because of the critical importance of sharing data from cohort and epidemiological studies.

Outlooks for the future

Similarly to the Wellcome Trust, CRUK realised that cultural change is needed for sharing to become the normality. CRUK have initiated many national and international partnerships to help the reward of data sharing.

One of them is a collaboration with the YODA (Yale Open Data Access) project aiming to develop metrics to monitor and evaluate data sharing. Other areas of collaborative work include collaboration with other funders on development of guidelines on ethics of data management and sharing, platforms for data preservation and discoverability, procedures for working with population and clinical data. Jamie stressed that the key thing for CRUK is to work closely with researchers and research managers – to understand the challenges and work through these collaboratively, and consider exciting new initiatives to move the data sharing field forwards.

Links

Published 5 February 2016
Written by Dr Marta Teperek, verified by David Carr and Jamie Enoch
Creative Commons License

What does a researcher do all day?

Recently, Paul Jervis-Heath* came to speak to Cambridge Libraries staff about work he had done as part of the Cambridge Libraries user centred design programme during the previous academic year.

This project was trying to establish how Cambridge University administrative services would manage the RCUK block grant provided to the University to support the RCUK Open Access policy. The end goal of the project was to design products and services, so the team of six working on the programme needed to start by trying to understand what academics did and what services they needed.

Information gathering process

During the project the team worked with 56 academics including contextual interviews with 34 academics. Paul noted however that it was also important to see the environments they were working in to ‘get into the headspaces’ of who they were designing for.

To this end the team shadowed 10 academics over a 48-hour period. They followed them through their day, literally sitting next to them. They watched lectures, sat in supervisions and took notes. As researchers did tasks the team asked questions about how they felt about the task – whether it was worth their time for example. The number was small because of the time intensity of this approach, however the process revealed good insights. Paul mentioned that they looked at the workarounds academics have for tasks and were able to determine how academics know what is succeeding and what ought they be doing.

The information gathering phase also included 12 co-design sessions looking at research and publishing tools, where they invited a group of participants to act as a designer. These were one on one co-design sessions. The academics were asked to design the journal they would like to publish in. As part of the process they took notes about how the participants talked about the publishing process.

This process is referred to as ‘bootstrapping’. The project was not pretending to have the full picture of what academic life is like. However the findings are robust enough to form an idea of what academics are doing to then create something and take it back to the participants to be refined  based on feedback.

Wearing lots of hats

Academics have lots of roles and they get split both between the University and their College and between their teaching and research roles. Paul noted that being an academic is really three or four jobs – each person needs to decide what they will be very good at. He observed that academics have to discover things that are new to the world as well as all of their other administration and work.

Many of the academics observed had between six and eight, sometimes 10 different roles. Some of these come with a job title, and others are unofficial because the academic wants to be a good supervisor, tutor, or a good colleague. The longer someone is around, the more roles they collect. The team started trying to graph people’s job titles as part of the project but this proved challenging because academia is not like a company where people have a fixed job title. Paul described it as more like a series of badges where an academic gets new things ‘pinned on’.

Academics are both teachers and researchers. Paul noted it is always interesting to see which one the participants mentioned first, their teaching role or their research role.

Teaching

Teaching takes up most of the term time and there is no time for research other than, say, putting together reading lists. For most researchers, about 20 minutes is the time length they have available for anything. This is how they carve up their day.

Everybody teaching at Cambridge is a University Teaching Officer – which has four levels. People start off as a Lecturer, then Senior Lecturer, then Reader, the Professor. There are additional roles like the Head of Department, which typically rotates as a two year position. Then there are people who are Director of Studies both within a department and in the Colleges. Tutors look after the pastoral element of life in the College. And that’s just teaching roles.

Research

The other side of the coin is the research roles. People start as Research Associates where they are hired for a specific research project which means there is nothing to move onto, so the person might have to move to a new university. Postdocs often don’t have anywhere to go they tend to use libraries, coffee shops and working from home. For many people the College is their office.

Gaining a Junior Research Fellowship is an important step because the University is funding the research in some way, however most positions are a fixed length. Having your JRF means they know where they are going to be. The next step is a Senior Research Fellow, then Principal Investigator. In science research happens in groups and the Principal Investigator leads the project.

Many people likened running a research group as running a small company while remaining research active. The Principal Investigator is similar to managing director of a small company. Some of these activities they don’t have any real training for. No-one has told them how to manage expenditure of a research project, or how to interview people. Several people noted that the hardest thing is recruitment not least because often candidates are abroad and interviews happen over Skype and Google Hangout. There is a big element of doubt about who they have employed.

Often collaborations are across time zones so researchers are fitting in calls in the early morning and evening to allow for time zones.

Academic roles in detail

The academic roles tended to fall into the following areas:

  • College role – Supporting students, Public relations administration, research, consultancy teaching
  • Personal administration – Travel arrangements, updating diary, updating CV and publication lists
  • College administration – committee meetings and reading papers, reviewing and interviewing candidates for the college, selecting the admissions.
  • Supporting students – both academic and pastorally, for example providing information about the college or problems with students not coping with work or taking students to hospital.
  • Teaching
    • Lectures (including preparation and planning curriculum, getting lecture rooms, sorting out timetables.
    • Putting slides and demos and reading list up in the course Moodle.
    • Writing the exam papers, preparing materials they will need.
    • Final issues like meeting the lab technicians, marking the exams.
  • Research
    • Applying for grant funding involves obtaining quotes from suppliers and partners to go into applications, creating budgets meeting funders, writing applications, research project management.
    • Setting up experiments, and gathering data and analysing results.
    • A large amount of writing to tell people about it and published – it doesn’t count unless it is published in a good journal. Lots of work in formatting and editing and the reviewing.
    • There is informal work – peer reviews. For journals official peer review is usually predicated by informal peer review – people will review each other’s papers to increase chances of getting accepted.
    • Managing research groups – running meetings setting goals, managing expenditure, writing job descriptions, recruitment, approving leave
    • Once published all the outreach – including listing the work in Symplectic, seminars, going to conferences and doing speaking engagements. Going to London to be interviewed.
  • Consultancy – meeting collaborators

Disciplinary differences in research

Disciplines differ immensely from one another but not necessarily in the ways traditionally thought of. Rather than there being a Science versus Humanities divide, a more accurate way of thinking about types of research relate to whether the work is being done in a group or by a solo researcher.

The size of the research group is partly determined by the expense of the equipment. Research such as that done by CERN is very expensive and requires grants. In AHSS there is less of a need for external funding (or possibly less money available funding). Note that Junior or Senior Research Fellows tend to be funded by the University but Principal Investigators are often funded through grants.

The pace of the discipline changes how people publish – in fast disciplines there are shorter units of publication, and slower disciplines have longer ones. Physics is very fast discipline so they upload pre-prints to arXiv.org. For example the role of journals in physics is not as important as biology.

Transparency changes across disciplines as well. For example physics is very open and biology is secretive – even colleagues often don’t know what others are working on. Transparency can be measured by the competitiveness of the discipline. It can affect the discipline of the research groups – some are open, others are secretive.

The structure of research groups

Research groups were a surprise to Paul. Members do not work together like you do on a project team. Research groups manifest as a set of researchers following their own interests but generally working in the same area. The researchers share methods and equipment but otherwise they are doing their own thing.

Some groups are supportive with mentoring but others are really competitive. Sometimes this comes from the research group and other times it comes from the people in the group. This appears to be led by the discipline culture of where they come from. It is worth noting that while anecdotally Cambridge people have more freedom, in Cambridge there is a cultural tendency not to show any weakness.

Day in the life graphics

Paul then took the group through the ‘day in the life’ diagrams created out of the shadowing done in Michaelmas Term 2013 (October to December). The graphics he discussed included:

The vertical axis reflects how happy the academic was over the day. High points tend to coincide with having contact with people and talking about their discipline such as discussions with PhD students, or with a research group. However lecturing is not a high point because there is no two-way communication – all the students sit at the back, the lecturer only gets feedback get at the end.

What causes one of the greatest emotional lows for a researcher is being rejected for a paper. They have often put all of their effort and knowledge into a journal paper. If it is rejected after peer review they are being told they have wasted two years of their life. Paul noted that some reviewing boards are brutal and the feedback given is, frankly, rude.

There is a similar low point if an application for grant funding is unsuccessful – it is similar to a rejection. Grant funding applications are worse than a paper as the researcher has to argue why the work is important and why the funder should fund it. Generally funding bodies are not as brutal but they are awarding funding to competitors – so it is a double blow.

Research and publishing experience map

Paul also talked the group through the Research and Publishing Experience Map. As part of the project the team was looking to see if the University was involved in the publishing process in terms of helping it. However the team found that there is no contact with the University during the process of research and publishing. There was no official checkpoint where academics had to tell the University about what they were doing. While there might be a discussion between the person and their supervisor, it is not recorded anywhere.

The research group will know where articles have been submitted, but the information is not captured anywhere – except in their inbox. But in research groups people move on so even a shared memory is lost. So there is no way to collect data, and no place to archive the administration for researchers. While the Research Office knows about the research grant, what a researcher does with the money is up to them. There are not many official touch points with the University.

The result of this work was a need to artificially engineer a touch point with the academics to ensure that they are able to meet their compliance requirements. The www.openaccess.cam.ac.uk upload system is the result.

* Paul now works for a consulting company Modern Human

Published 1 February 2016
Written by Dr Danny Kingsley
Creative Commons License