Tag Archives: reproducibility

The case for Open Research: reproducibility, retractions & retrospective hypotheses

This is the third instalment of ‘The case for Open Research’ series of blogs exploring the problems with Scholarly Communication caused by having a single value point in research – publication in a high impact journal. The first post explored the mis-measurement of researchers and the second looked at issues with authorship.

This blog will explore the accuracy of the research record, including the ability (or otherwise) to reproduce research that has been published, what happens if research is retracted, and a concerning trend towards altering hypotheses in light of the data that is produced.

Science is thought to progress  through the building of knowledge through questioning, testing and checking work. The idea of ‘standing on the shoulders of giants’ summarises this – we discover truth by building on previous discoveries. But scientists are very rarely rewarded for being right, they are rewarded for publishing in certain journals and for getting grants. This can result in distortion of the science.

How does this manifest? The Nine Circles of Scientific Hell describes questionable research practices that occur, ranging from Overselling, Post-Hoc storytelling, p-value Fishing, Creative use of Outliers to Non or Partial Publication of Data. We will explore some of these below. (Note this article appears in a special issue of Perspectives on Psychological Science on the Replicability in Psychological Science – which contains many other interesting articles).

Much as we like to think of science as an objective activity it is not. Scientists are supposed to be impartial observers, but in reality they need to get grants, and publish papers to get promoted to more ‘glamorous institutions’. This was the observation of Professor Marcus Munafo in his presentation ‘Scientific Ecosystems and Research Reproducibility’ at the Research Libraries UK conference held earlier this year (the link will take you to videos of the presentations). Monafo observed that scientists are rarely rewarded for being right, so the scientific record is being distorted by the scientific ecosystem.

Monafo, a Biological Psychologist at Bristol University, noted that research, particularly in the biomedical sciences, ‘might not be as robust as we might have hoped‘.

The reproducibility crisis

A recent survey of over 1500 scientists by Nature tried to answer the question “Is there a reproducibility crisis?” The answer is yes, but whether that matters appears to be debatable: “Although 52% of those surveyed agree that there is a significant ‘crisis’ of reproducibility, less than 31% think that failure to reproduce published results means that the result is probably wrong, and most say that they still trust the published literature.”

There are certainly plenty of examples of the inability to reproduce findings. Pharmaceutical research can be fraught. Some research into potential drug targets found that in almost two-thirds of the projects looked at, there were inconsistencies between published data and the data resulting from attempts to reproduce the findings. 

There are implications for medical research as well. A study published last month looked at functional MRI (fMRI), noting that when analysing data using different experimental designs they should in theory find a significance threshold of 5% (a p-value of less than 0.05  which is conventionally described as statistically significant). However they found “the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.”

A 2013 survey of cancer researchers found that approximately half of respondents had experienced at least one episode of the inability to reproduce published data. Of those people who followed this up with the original authors, most were unable to determine why the work was not reproducible. Some of those original authors were (politely) described as ‘less than “collegial”’.

So what factors are at play here? Partly it is due to the personal investment in a particular field. A 2012 study of authors of significant medical studies concluded that: “Researchers are influenced by their own investment in the field, when interpreting a meta-analysis that includes their own study. Authors who published significant results are more likely to believe that a strong association exists compared with methodologists.”

This was also a factor in a study Why Most Published Research Findings Are False that considered the way research studies are constructed. This work found that “for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias.”

Psychology is a discipline where there is a strong emphasis on novelty, discovery and finding something that has a p-value of less than 0.05. There is such an issue with reproducibility in psychology that there are large efforts to try and reproduce psychological studies to estimate the reproducibility of the research. The Association for Psychological Science has launched a new article type of Registered Replication Reports which consists of “multi-lab, high-quality replications of important experiments in psychological science along with comments by the authors of the original studies”.

This is a good initiative, although there might be some resistance to this type of scrutiny. Something that was interesting from the Nature survey on reproducibility was the question of what happened when researchers attempted to publish a replication study. Note that only a few of respondents had done this, possibly because incentives to publish positive replications are low and journals can be reluctant to publish negative findings. The study found that “several respondents who had published a failed replication said that editors and reviewers demanded that they play down comparisons with the original study”.

What is causing this distortion of the research? It is the emphasis on publication of novel results in high impact journals. There is no reward for publishing null results or negative findings.

HARKing problem

The p-value came up again in a discussion about HARKing at this year’s FORCE2016 conference (HARK stands for Hypothesising After the Results are Known – a term coined in 1998).

In his presentation at FORCE2016 Eric Turner, Associate Professor OHSU, spoke about HARKing (see this video 37 minutes onward).  The process is that the researcher conceives the study, writes the protocol up for their eyes only, with a hypothesis and then collects lots of other data – ‘the more the merrier’ according to Turner. Then the researcher runs the study and analyses the data. If there is enough data, the researcher can try alternative methods and can play with statistics. ‘You can torture the data and it will confess to anything’ noted Turner. At some point the p-value will come out below 0.05. Only then does the research get written up.

Turner noted that he was talking about the kind of research where the work is trying to confirm a hypothesis (like clinical trials). This is different to hypothesis-generating research.

In the US clinical trials with human participants must be registered with the Federal Drug Agency (FDA) so it is possible to see the results of all trials. Turner talked about his 2008 study looking at antidepressant trials, where the journal version of the results supported the general view that antidepressants always beat placebo.  However when they looked at the FDA version of all of the studies of the same drugs it happened that half of the studies were positive and half and half were not positive. The published record does not reflect the reality.

The majority of the negative studies were simply not published, but 11 of the papers had been ‘spun’ from negative to positive. These papers had a median impact factor of 5 and median citations of 68 – these were highly influential articles. As Turner noted ‘HARKing is deceptively easy’.

This perspective is supported by the finding that a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. Indeed Munafo noted that over 90% of the psychological literature finds what it purports to set out to do. Either the research being undertaken is extraordinarily mundane, or something is wrong.

Increase in retractions

So what happens when it is discovered that something that is published is incorrect? Journals do have a system which allows for the retraction of papers, and this is a practice which has been increasing over the past few years.  Research looking at why the number of retractions have increased found that it was partly due to lower barriers to publication of flawed articles. In addition papers are now being retracted for issues like plagiarism and retractions are now happening more quickly.

Retraction Watch is a service which tracks retractions ‘as a window into the scientific process’. It is enlightening reading with several stories published every day.

An analysis of correction rates in the chemical literature found that the correction rate averaged about 1.4 percent for the journals examined. While there were numerous types of corrections, chemical structures, omission of relevant references, and data errors were some of the most frequent types of published corrections. Corrections are not the same as retractions, but they are significant.

There is some evidence to show that the higher the impact factor of the journal a work is published in, the higher the chance it will be retracted. A 2011 study showed a direct correlation between impact factor and the number of retractions, with New England Journal of Medicine topping the list. This situation has led to claims that the top ranking journals publish the least reliable science.

A study conducted earlier this year demonstrated that there are no commonly agreed definitions of academic integrity and malpractice. (I should note that amongst other findings the study found 17.9% (± 6.1%) of respondents reported having fabricated research data. This is almost 1 in 5 researchers. However there have been some strong criticisms of the methodology.)

There are questions about how retractions should be managed. In the print era it was not unheard of for library staff to put stickers into printed journals notifying a retraction. But in the ‘electronic age’ asked one author in 2002 when the record can be erased, is this the right thing to do because erasing the article entirely is amending history.  The Committee on Publication Ethics (COPE) do have some guidelines for managing retractions which suggest the retraction be linked to the retracted article wherever possible.

However, from a reader’s perspective, even if an article is retracted this might not be obvious. In 2003* a survey of 43 online journals found 17 had no links between the original articles and later corrections. When present, hyperlinks between articles and errata showed patterns in presentation style, but lacked consistency. There are some good examples – such as Science Citation Index but there was a lack of indexing in INSPEC, and a lack of retrieval with SciFinder Scholar.

[*Note this originally said 2013, amended 2 September 2016]

Conclusion

All of this paints a pretty bleak picture. In some disciplines the pressure to publish novel results in high impact journals results in the academic record being ‘selectively curated’ at best. At worst it results in deliberate manipulation of results. And if mistakes are picked up there is no guarantee that this will be made obvious to the reader.

This all stems from the need to publish novel results in high impact journals for career progression. And when those high impact journals can be shown to be publishing a significant amount of subsequently debunked work, then the value of them as a goal for publication comes into serious question.

The next instalment in this series will look at gatekeeping in research – peer review.

Published 14 July 2016
Written by Dr Danny Kingsley
Creative Commons License

Consider yourself disrupted – notes from RLUK2016

The 2016 Research Libraries UK conference was held at the British Library from 9-11 March on the theme of disruptive innovation. This blog pulls out some of the highlights personally gained from the conference:

  • If librarians are to be considered important – we as a community need to be strong in our grasp of understanding scholarly communication issues
  • We need to know the facts about our subscriptions to, usage of and contributions to scholarly publishing
  • We need high level support in institutions to back libraries in advocacy and negotiation with publishers
  • Scientists are rarely rewarded for being right, so the scientific record is being distorted by the scientific ecosystem
  • Society needs more open research to ensure reproducibility and robust research
  • The library of the future will have to be exponentially more customisable than the current offering
  • The information seeking behaviour of researchers is iterative and messy and does not match library search services
  • Libraries need to ‘create change to triumph’ – to be inventors rather than imitators
  • Management of open access issues need to be shared across institutions with positive outcomes when research offices and libraries collaborate.

I should note this is not a comprehensive overview of the conference, and I have blogged separately about my own contribution ‘The value of embracing unknown unknowns’. Some talks were looking at the broader picture, others specifically at library practice.

Stand your ground – tips for successful publisher negotiations

The opening keynote presentation was by Professor Gerard Meijer, President of Radboud University who conducted the recent Dutch negotiations with Elsevier.

The Dutch position has been articulated by Sander Dekker, the State Secretary  of Education who said while the way forward was gold Open Access, the government would not provide any extra money. Meijer noted this was sensible because every extra cent going into the system goes into the pocket of publishers – something that has been amply demonstrated in the UK.

All universities in the Netherlands are in top 200 universities in the world. This means all research is good quality – so even if it is only 2% of the world output, the Netherlands has some clout.

Meijer gave some salient advice about these types of negotiations. He said this work needs to be undertaken at the highest level at the universities. There are several reasons for this. He noted that 1.5 to 2 percent of university budget goes to subscriptions – and this is growing as budgets are being cut – so senior leadership in institutions should take an active position.

In addition if you are not willing to completely opt out of licencing their material then you can’t negotiate, and if you are going to opt out you will need the support of the researchers. To that end communication is crucial – during their negotiations, they would send a regular newsletter to researchers letting them know how things were going.

Meijer also stressed the importance of knowing the facts, and the need to communicate and inform the researchers about these facts and the numbers. He noted that most researchers don’t know how much subscriptions cost. They do know however about article processing charges – creating a misconception that Open Access is more expensive.

Institutions in the Netherlands spent €9.2 billion million on Elsevier publications in 2009, which rose to €11billion million* in 2014. Meijer noted that he was ‘not allowed’ to tell us this information due to confidentiality clauses. He drolly observed “It will be an interesting court case to be sued for telling the taxpayers how their money is being spent”. He also noted that because Elsevier is a public company their finances are available, and while their revenue goes up, their costs stay the same.

Apparently Wiley and Springer are willing to go into agreements. However Elsevier are saying that a global business model doesn’t match with a local business requirement. The Netherlands  has not yet signed the contract with Elsevier as they are working out the detail.

Broadly the deal is for three years, from 2016 to 2018. The plan is to grow the Open Access output from nothing to 10% in 2016, 20% in 2017, 30% in 2018 and want to do that without having to pay APCs. To achieve this they have to identify journals that we make Open Access , by defining domains where all journals in these domains we make open access.

Meijer concluded this was a big struggle – he would have liked to have seen more – but what we have is good for science. Dutch research will be open in fields where most Open Access is happening and researchers are paying APCs. Researchers can look at the long list of journals that are OA and then publish there.

*CORRECTION: Apologies for my mistyping.  Thanks to    @WvSchaik for pointing out this error on Twitter. The slide is captured in this tweet.

The future of the research library

Nancy Fried Foster from Ithaka S+R and Kornelia Tancheva from Cornell University Library spoke about research practices and the disruption of the research library. They started by noting that researchers work differently now, using different tools. The objective of their ‘A day in the life of a serious researcher’ work was exploring research practices to inform the vision of library of the future and identify improvements we could make now.

They developed a very fine-grained method of seeing what people do which focuses on what people really do in the workplace. This used a participatory design approach. Participants (who were mainly post graduates) were asked to map or log their movements in one single day where at least some of their time was engaged in research. The team then sat with the person the following day to ask them to narrate their day – and talk about seeking, finding and using information. There was no distinction between academic and non-academic activity.

The team looked at the things that people were doing and the things that the library could and will be. The analysis took a lot of time, organising into several big categories:

  • Seeking information
  • Academic activities
  • Library resources
  • Space, self management and
  • Circum-academic activities – activities allied to the researchers academic line but not central.

They also coded for ‘obstacles’ and ‘brainwork’.

The participants described their information seeking as fluid and constant – ‘you can just assume I am kind of checking my email all the time’. They also distinguished between search and research. One quote was ‘I know the library science is very systematic and organised and human behaviour is not like that’.

Information seeking is an iterative process, it is constant and not systematic. The search process is highly idiosyncratic – our subjects have developed ways of searching for information that worked for them. It doesn’t matter if it is efficient or not. They are self conscious that it is messy. ‘I feel like the librarians must be like “this is the worst thing I have ever heard”’.

Information evaluation is multi-tiered – eg: ‘If an article is talking about people I have heard of it is worth reading’. Researchers often use a mash up of systems that will work for that project. For example email is used as an information management tool.

Connectivity is important to researchers, it means you can work anywhere and switch rapidly between tasks. It has a big impact on collaboration – working with others was continuously mentioned in the context of writing. However sometimes researchers need to eliminate technology to focus.

Libraries have traditionally focused too much on search and not enough on brain work – this is a potential role for libraries. References to the library occurred throughout the process. Libraries are often thought of as a place for refuge – especially for the much needed brain work. The need for self management – enable them to manage their time prioritise the demands on their attention. Strategies depended on a complicated relationship with technology.

One of the major themes emerging from the work is search is idiosyncratic and not important, research has no closure, experts rule and research is collaboration. The implications for the future library are that the future library is a hub, not just focusing on a discovery system but connecting people with knowledge and technologies.

If we were building a library from scratch today what would it look like? There will need to be a huge amount of customisation to adjust tools to suit researchers personal preferences. The library of the future will have to be exponentially more customisable than the current offering. Libraries will have to make available their resources on customisable platforms. We need to shift from non-interoperable tools to customisation.

So if the future were here today we would think of future library – an academic hub (improving current library services) and an application store. We should take on even more of a social media aspect. Think of a virtual ‘app store’ – on an open source platform that provides the option for people to suggest short cuts – employ developers to develop these modules quickly. Take a leadership role in ensuring vendor platforms can be integrated. All library resources will speak easily to the systems our users are using. We need to provide individualised services rather than one size fits all.

Scientific Ecosystems and Research Reproducibility

The scientific reward structure determines the behaviour of researchers and that this has spawned the reproducibility crisis according to Marcus Munafo from the University of Bristol.

Marcus started by talking about the P value where the statistically significant value is 95% – that is, the chance of the hypothesis being wrong is less than five in 100. Generally, studies need to cross this threshold to get published, so there is evidence to show that original studies often suggest a large effect – however when attempted, these effects are not able to be replicated.

Scientists are supposed to be impartial observers, but in reality they need to get grants, and publish papers to get promoted to more ‘glamorous institutions’ (Marcus’ words). Scientists are rarely rewarded for being right, so the scientific record is being distorted by the scientific ecosystem.

Marcus noted it is common to overstate your data or error check your data if your first analysis doesn’t tell you what you are looking for. This ‘flexible analysis’ is quite commonplace, if we look at literature as a whole. Often there is not enough detail in the paper to allow the reproducibility of the work. There are nearly as many unique analysis pipelines as there were studies in the sample – so this flexibility in the joint analysis tool gets leveraged to get the result you want.

There is also evidence that journal impact factors are a very poor indicator of quality, indeed it is a stronger indicator of retraction than quality. The idea is that the whole science will self correct. But science won’t sort itself out in a reasonable timeframe. If you look at the literature you see that replication is the exception rather than the norm.

One study showed among 83 articles recommending effective interventions, 40 had not been replicated, and of those that had been replicated many showed the works had stronger findings in the first paper than in the replication, and some were contradicted in the replication.

Your personal investment in the field shapes your position – unconscious biases that affects all of us. If you come in as an early career scientist you get an impression that the field is more robust than it is in reality. There is hidden literature that is not citable – only by looking at this you have a balanced sense of how robust the literature is. There are many studies that make a claim in the abstract that is not supported by more impartial reading. Others are ‘optimistic’ in the abstract. The articles that describe bad news receive far fewer citations than would be expected. People don’t want to cite bad news. So is science self correcting?

We can introduce measures to help science self correct. In 2000 the requirement to register the outcome of clinical trials began. Once they had to pre-specify what the outcome would be then most of the findings were null. That is why it is a scientific ecosystem – the way we are incentivised has become distorted over the years.

Researchers are incentivised to produce a small number of papers that are eye catching.  It is understandable why you would want to focus on quality over quantity. We can give more weight to confirmatory studies and try to move away from the focus on publishing in certain types of studies. We shouldn’t be putting all our effort into high risk, high return.

What do we do about this? There can be top down measures, but individual groups can work in ways to improve the ways we work, such as adopting the open science way of working. This is not trivial – for example we can’t make data available without the consent of participants. Possible solutions include pre-registering all the plans, set up studies so the data can be made open, ensure publications are gold OA. These measures serve as a quality control method because everything gets checked because people know it is going to be made available. We come down hard on academics who make conscious mistakes – but we should be encouraging people to identify their own errors.

We need to introduce quality control methods implicitly into our daily practice. Open data is a very good step in that direction. There is evidence that researchers who know their data is going to be made open are more thorough in their checking of it. Maybe it is time for an update in the way we do science – we have statistical software that can run hundreds of analysis, and we can do text and data mining of lots of papers. We need to build in new processes and systems that refine science and think about new ways of rewarding science.

Marcus noted that these are not new problems, quoting from Reflections on the Decline of Science in England written by Babbage in 1830.

Marcus referred to many different studies and articles in his talk, some of which I have linked out to here:

Creating change to triumph: A view from Australia

The idea of creating change to triumph was the message of Jill Benn, the Librarian at the University of Western Australia. She discussed Cambietics, the science of managing change. This was a theory developed in 1985 by Barrett, with three stages:

  • Coping with change to survive
  • Capitalising on change
  • Creating change to triumph.

This last is the true challenge – to be an inventor rather than an imitator. Jill gave the Australian context. The country is 32 times bigger than UK, but has a third of the population, with 40 universities around the country. She noted that one of the reasons libraries in Australia have collaborated is the isolation.

Research from Australia counts for 4% of the world’s research output, it is the third largest export after energy, and out-performs tourism. The political landscape really affects higher education. There has been a series of five prime ministers in five years.

Australia has invested heavily in research infrastructure – mostly telescopes and boats. The Australian National Data Service was created and this has built the Research Data Australia interface – an amazing system full of data. The libraries have worked with researchers to populate the repository. There has been a large amount of capacity building. ANDS worked with libraries to build the capacities – the ’23 things’ training programme. You self register – on 1 March, 840 people had signed up for the programme.

The most recent element of the government’s agenda has been innovation. Prime Minister Turnbull has said he wanted to end the ‘publish or perish’ culture of research to increase the impact on community. There is a national innovation and science agenda and the government would not longer take into account publications for research. It is likely the next ERA (Australia’s equivalent of the REF) will involve impact in the community. The latest call is “innovation is the new black”.

There is financial pressure on the University sector – which pays in US dollars which is a problem. The emphasis on efficiency means the libraries have to show value and impact to the research sector.

Many well-developed services exist in university libraries to support research. Australian institutional repositories now have over 650K full text items, which are downloaded over 1 million times annually, there are data librarians and scholarly communication librarians. Some of the ways in which libraries have been asked to deliver capacity is CAUL and its Research Advisory Committee – to engage in the government’s agenda. There are three pillars – capacity building, engagement and advocacy, to promote the work of our libraries to bodies like Universities Australia.

Jill also mentioned the Australasian Open Access Strategy Group which has had a green rather than a gold approach. Australians are interested in open access. It is not yet clear what our role will be of institutional repositories into the future. In an environment where the government wants us to share our research.

How can we benchmark the Australian context? It is difficult. Look at our associations and about what data we might be able to share. Quote from Ross Wilkinson – yes there are individuals but the collective way Australia has managed data we are better able to engage internationally. Despite the investment into repositories in Australia – the UK outperforms Australia.

Australian libraries see themselves as genuine partners for research and we have a healthy self confidence (!). Libraries must demonstrate value and impact and provide leadership. Australian libraries have created change to triumph.

Open access mega-journals and the future of scholarly communication

This talk was given by Professor Stephen Pinfield from Sheffield University. He talked about the Open Access Mega Journal project he is working on with potentially disruptive open access journals (the Twitter handle is @oamj_project).

He began where it all began – with PLOS ONE, which is now the biggest journal in the world. Stephen noted that mega journals are full of controversy, listing comments ranging from them being the future of academic publishing, a disruptive innovation to the best possible future system.

However critics see them variously as a dumping ground, career suicide for early career researchers publishing in them and a cynical money making venture. However, Pinfield noted that despite considerable searching acknowledging what ‘people say’ is different from being able to provide attributed negative statements about mega-journals.

The open access and wide scope nature of mega-journals reverses the trend over past few years where journals have been further specialising, They are identifiable by their approach to quality control, with an emphasis on scientific soundness only rather than subjective assessments of novelty and also by their post publication metrics.

Pinfield noted that there are economies of scale for mega journals – this means that we have single set of processes and technologies. This enables a tiered scholarly publishing system. Mega-journals potentially allow highly selective journals to go open access (who often argue that they reject so much they couldn’t afford to go OA). Pinfield hypothesised that a business model could be where a layer of highly selective titles sits above a layer of moderately selective mega journals. The moderately selective journals provide the financial subsidy but the highly selective ones provide the reputational subsidy. PLOS is a good example of this symbiotic relationship.

The emphasis on ‘soundness’ in the quality control process reduces the subjectivity of judgements of novelty and importance and potentially shifts the role and the power of the gatekeepers. Traditionally the editors and editorial board members have been the arbiters of what is novel.

However this opens up some questions. If it is only a ‘soundness’ judgement then the question is whether power is shifted for good or ill? Also does the idea of ‘soundness’ translate to the Humanities? There is also the problem of an overreliance on metrics. Are the citation values of journals driven by the credibility or the visibility of the journals?

Pinfield emphasised the need for librarians to be informed and credible about their understanding of these topics. If librarians are to be considered important – we as a community need to be strong in our grasp of understanding these issues. There is an ongoing need to keep up to date and remain credible.

Working together to encourage researcher engagement and support

There were several talks about how institutions have been engaging researchers, and many of them emphasised the need to federate the workload across the institution. Chris Aware from the University of Hull discussed some work he was doing with Valerie McCutcheon on the current interaction between library and other parts of the institution in supporting OA, understand how OA is and could be embedded.

The survey revealed a desire for the management of Open Access to be more spread across the institution into the future. Libraries should be more involved in the management of the research information system and managing the REF. However Library involvement in getting Open Access into grant applications is lower – this is a research role, but it is worth asking how much this underpins subsequent activity.

As an aside Chris noted a way of demonstrating the value of something is to call it an ‘office’ – this is something the Americans do. (Indeed it is something Cambridge has done with the Office of Scholarly Communication).

Chris noted that if researchers don’t think about open access as part of the scholarly communications workflow then they won’t do it. Libraries play a key role in advocating and managing OA – so how can they work with other institutional stakeholders in supporting research?

Valerie later spoke about blurring and blending the borders between the Library and the Research Office. She noted that when she was working for Research and Enterprise (RSEO) she thought library people were nice, but she was not sure what the people do there. When she transferred to working in the Library, the perception back the other way was the same.

But the Research Office and the Library need to cooperate on shared strategic priorities. They are both looking out for changes in policy landscape they need to share information and collaborate on policy development and dissemination. They need better data quality in the research process to find solutions to create agile systems to support researchers.

At Glasgow the Library & RSEO were a good match because they had similar end uses and the same data. So this began a close collaboration between the two offices which worked together on the REF, used Enlighten. They also linked their systems (Enlighten and Research Systems) in 2010 where users can browse in the repository by the funder name. Glasgow has had a publications policy rather than an open access policy since 2008.

Valerie also noted that it was crucial to have high-level support and showed a video of Glasgow’s PVC-R singing the praises of the work the Library was doing.

The Glasgow Open Access model has been ‘Act on acceptance’ since 2013 – a simple message with minimal bureaucracy. A centralised service with ‘no fancy meetings’. Valerie also noted that when they put events on they don’t say it is a Library event, the sessions subject based not department based.

Torsten Reimer and Ruth Harrison discussed the support offered at Imperial College, where Torsten said he was originally employed for developing the College’s OA mandate but then the RCUK and the HEFCE policy came into place and changed everything. At Imperial, scholarly communications is seen as an overall concern for the College rather than specifically a Library issue.

Torsten noted the Library already had a good relationship with the departments. The Research Office is seen by researchers as a distraction from their research, but the Library is seen as helping research. However because the two areas have been able to approach everything with one single aim, this has allowed open access and scholarly support to happen across the institution and allowed the library to expand.

Imperial have one workflow and one system for open access which is all managed through Symplectic (there had been separate systems before). They have a simple workflow and form to fill in, then have a ticketing type customer workflow system plugged into Symplectic to pull information out at the back end. This system has replaced four workflows, lots of spreadsheets and much cut and pasting.

Sally Rumsey talked about how Oxford have successfully managed to engage their research community with their recently launched ‘Act on Acceptance’ communication programme.

Summary

This is a rundown of a few of the presentations that spoke to me. There were also excellent speed presentations, Lord David Willetts, the former Minister for Universities and Science spoke, we split up into workshops and there was a panel of library organisations around the world who discussed working together.

The personal outcomes from the conference include:

  • An invitation to give a talk at Cornell University
  • An invitation to collaborate with some people at CILIP about ensuring scholarly communication is included in some of the training offered
  • Discussion about forming some kind of learned society for Scholarly Communication
  • Discussion about setting up a couple of webinars – ‘how to start up an office of scholarly communication’ and ‘successful library training programmes’
  • Also lots of ideas about what to do next – the issue of language and the challenges we are facing in scholarly communication because of language deserves some investigation.

I look forward to next year.

Published 14 March 2016
Written by Dr Danny Kingsley
Creative Commons License

 

Software Licensing and Open Access

As part of the Office of Scholarly Communication Open Access Week celebrations, we are uploading a blog a day written by members of the team. Wednesday is a piece by Dr Marta Teperek reporting on the Software Licensing Workshop held on 14 September 2015 at Cambridge.

Uncertainties about sharing and licensing of software

If the questions that the Research Data Service Team have been asked during data sharing information sessions with over 1000 researchers at the University of Cambridge are any indicator, then there is a great deal of confusion about sharing source code.

There have been a wide range of questions during the discussions in these sessions, and the Research Data Service Team has recorded these. We are systematically ensuring that the information we are providing to our research community is valid and accurate. To address the questions about source code we decided to call in expert help. Shoaib Sufi and Neil Chue Hong* from the Software Sustainability Institute agreed to lead a workshop on Software Licensing in September, at the Computer Lab in Cambridge. Shoaib’s slides are here, and Neil’s slides on Open Access policies and software sharing are here.

Malcolm Grimshaw and Chris Arnot from Cambridge Enterprise also came to the workshop to answer questions about Cambridge-specific guidance on software commercialisation.

We had over 50 researchers and several research data managers from other UK universities attending the Software Licensing workshop. The main questions we were trying to resolve was: Are researchers expected to share source code they used in their projects? And if so, under what conditions?

Is software considered as ‘research data’ and does it need to be shared?

The starting question in the discussion was whether software needed to be shared. Most public funders now require that research data underpinning publications is made available. What is the definition of research data? According to the EPSRC research data “is defined as recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings”. Therefore, if software is needed to validate findings described in a publication, researchers are expected to make it available as widely as possible. There are some exceptions to this rule. For example, if there is an intention to commercialise the software there might not be a need to share it, but the default assumption is that the software should be shared.

The importance of putting a licence on software

It is important that before any software is shared, the creator considers what they would like others to be able to do with it. The way to indicate the intended reuse of the software is to place a licence on it. This governs the permission being granted to others with regards to source code by the copyright holder(s). A licence determines whether the person who wants to get hold of software is allowed to use, copy, resell, change, or distribute it. Additionally, a licence should also determine who is liable if something goes wrong with the software.

Therefore, a licence not only protects the intellectual property, but also helps others to use the software effectively. If people who are potentially interested in a given piece of software do not know what they are allowed to do with it, it is possible they will search for alternative solutions. As a consequence, researchers could lose important collaborators, buyers, or simply decrease the citation rate that could have been gained from people using and citing software in their publications.

Who owns the copyright?

The most difficult question when it comes to software licensing is determining who owns the copyright – who is allowed to license the software used in research? If this is software created by a particular researcher then it is likely that s/he will be the copyright owner. At the University of Cambridge researchers are the primary owners of intellectual property. This is however a very generous right – typically employers do not allow their employees to retain copyright ownership. Therefore, the issue of copyright ownership might get very complicated for researchers involved in multi-institutional collaborations. Additionally, sometimes funders of research will retain copyright ownership of research outputs.

Consequences of licensing

An additional complication with licensing software is that most licences cannot be revoked. Once something has been licensed to someone under a certain licence, it is not possible to take it back and change the licence. Moreover, if there is one licence for a set of software, it might not be possible to license a patch to the software under a different licence. The issue of licence compatibility sparked a lot of questions during the workshop, with no easy answers available. The overall conclusion was that whenever possible, mixing of licences should be avoided. If use of various licences is necessary, researchers are recommended to get advice from the Legal Services Office.

Good practice for software management

So what are the key recommendations for good practice for software management? Before the start of a research project, researchers should think about who the collaborators and funders are, and what the employer’s expectations are with regards to intellectual property. This will help to determine who will own the copyright over the software. Funders’ and institutional policies for research data sharing should be consulted for expectations about software sharing With this information it is possible to prepare a data management plan for the grant application.

During the project researchers need to ensure that their software is hosted in an appropriate code repository – for example, GitHub or Bitbucket. It is important to create (and keep updating!) metadata describing any generated data and software.

Finally, when writing a paper, researchers need to deposit all releases of data/software relevant to the publication in a suitable repository. It is best to choose a repository which provides persistent links e.g. Zenodo (which has a GitHub integration), or the University of Cambridge data repository (Apollo). It is important to ensure that software is licensed under an appropriate licence – in line with what others should be allowed to do with the software, and in agreement with any obligations there might be with any other third parties (for example, funders of the research). If there is a need to restrict the access to the software, metadata description should give reasons for this restriction and conditions that need to be met for the access to be granted.

Valuable resources to help make right decisions

Both Neil and Shoaib agreed that proper management and licensing of software might be sometimes complicated. Therefore, they recommended various resources and tools to provide guidance for researchers:

The workshop was organised in collaboration with Stephen Eglen from the Department of Applied Mathematics and Theoretical Physics (University of Cambridge) who chaired the meeting, and with Andrea Kells from the Computer Lab (University of Cambridge) who hosted the workshop.

The Research Data Service is also providing various other opportunities for our research community to pose questions directly of the funding bodies. We invited Ben Ryan from the EPSRC to come to speak to a group of researchers in May and the resulting validated FAQs are now published on our research data management website. Similarly, researchers met with Michael Ball from the BBSRC in August.

These opportunities are being embraced by our research community.

*About the speakers

Shoaib Sufi – Community Lead at the Software Sustainability Institute

Shoaib leads the Institute’s community engagement activities and strategies. Graduating in Computer Science from the University of Manchester in 1997, he has worked in the commercial sector as a systems programmer and then as software developer, metadata architect and eventually a project manager at the Science and Facilities Technologies Council (STFC).

Neil Chue Hong – Director at the Software Sustainability Institute

Neil is the founding Director of the Software Sustainability Institute. Graduating with an MPhys in Computational Physics from the University of Edinburgh, he began his career at EPCC, becoming Project Manager there in 2003. During this time he led the Data Access and Integration projects (OGSA-DAI and DAIT), and collaborated in many e-Science projects, including the EU FP6 NextGRID project.

Published 21 October 2015
Written by Dr Marta Teperek
Creative Commons License