Category Archives: Uncategorized

The case for Open Research: reproducibility, retractions & retrospective hypotheses

July 14, 2016UncategorizedHARKing, Open Research, psychology, replicability, reproducibility, retractionOffice of Scholarly Communication

This is the third instalment of ‘The case for Open Research’ series of blogs exploring the problems with Scholarly Communication caused by having a single value point in research – publication in a high impact journal. The first post explored the mis-measurement of researchers and the second looked at issues with authorship.

This blog will explore the accuracy of the research record, including the ability (or otherwise) to reproduce research that has been published, what happens if research is retracted, and a concerning trend towards altering hypotheses in light of the data that is produced.

Science is thought to progress through the building of knowledge through questioning, testing and checking work. The idea of ‘standing on the shoulders of giants’ summarises this – we discover truth by building on previous discoveries. But scientists are very rarely rewarded for being right, they are rewarded for publishing in certain journals and for getting grants. This can result in distortion of the science.

How does this manifest? The Nine Circles of Scientific Hell describes questionable research practices that occur, ranging from Overselling, Post-Hoc storytelling, p-value Fishing, Creative use of Outliers to Non or Partial Publication of Data. We will explore some of these below. (Note this article appears in a special issue of Perspectives on Psychological Science on the Replicability in Psychological Science – which contains many other interesting articles).

Much as we like to think of science as an objective activity it is not. Scientists are supposed to be impartial observers, but in reality they need to get grants, and publish papers to get promoted to more ‘glamorous institutions’. This was the observation of Professor Marcus Munafo in his presentation ‘Scientific Ecosystems and Research Reproducibility’ at the Research Libraries UK conference held earlier this year (the link will take you to videos of the presentations). Monafo observed that scientists are rarely rewarded for being right, so the scientific record is being distorted by the scientific ecosystem.

Monafo, a Biological Psychologist at Bristol University, noted that research, particularly in the biomedical sciences, ‘might not be as robust as we might have hoped‘.

The reproducibility crisis

A recent survey of over 1500 scientists by Nature tried to answer the question “Is there a reproducibility crisis?” The answer is yes, but whether that matters appears to be debatable: “Although 52% of those surveyed agree that there is a significant ‘crisis’ of reproducibility, less than 31% think that failure to reproduce published results means that the result is probably wrong, and most say that they still trust the published literature.”

There are certainly plenty of examples of the inability to reproduce findings. Pharmaceutical research can be fraught. Some research into potential drug targets found that in almost two-thirds of the projects looked at, there were inconsistencies between published data and the data resulting from attempts to reproduce the findings.

There are implications for medical research as well. A study published last month looked at functional MRI (fMRI), noting that when analysing data using different experimental designs they should in theory find a significance threshold of 5% (a p-value of less than 0.05 which is conventionally described as statistically significant). However they found “the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.”

A 2013 survey of cancer researchers found that approximately half of respondents had experienced at least one episode of the inability to reproduce published data. Of those people who followed this up with the original authors, most were unable to determine why the work was not reproducible. Some of those original authors were (politely) described as ‘less than “collegial”’.

So what factors are at play here? Partly it is due to the personal investment in a particular field. A 2012 study of authors of significant medical studies concluded that: “Researchers are influenced by their own investment in the field, when interpreting a meta-analysis that includes their own study. Authors who published significant results are more likely to believe that a strong association exists compared with methodologists.”

This was also a factor in a study Why Most Published Research Findings Are False that considered the way research studies are constructed. This work found that “for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias.”

Psychology is a discipline where there is a strong emphasis on novelty, discovery and finding something that has a p-value of less than 0.05. There is such an issue with reproducibility in psychology that there are large efforts to try and reproduce psychological studies to estimate the reproducibility of the research. The Association for Psychological Science has launched a new article type of Registered Replication Reports which consists of “multi-lab, high-quality replications of important experiments in psychological science along with comments by the authors of the original studies”.

This is a good initiative, although there might be some resistance to this type of scrutiny. Something that was interesting from the Nature survey on reproducibility was the question of what happened when researchers attempted to publish a replication study. Note that only a few of respondents had done this, possibly because incentives to publish positive replications are low and journals can be reluctant to publish negative findings. The study found that “several respondents who had published a failed replication said that editors and reviewers demanded that they play down comparisons with the original study”.

What is causing this distortion of the research? It is the emphasis on publication of novel results in high impact journals. There is no reward for publishing null results or negative findings.

HARKing problem

The p-value came up again in a discussion about HARKing at this year’s FORCE2016 conference (HARK stands for Hypothesising After the Results are Known – a term coined in 1998).

In his presentation at FORCE2016 Eric Turner, Associate Professor OHSU, spoke about HARKing (see this video 37 minutes onward). The process is that the researcher conceives the study, writes the protocol up for their eyes only, with a hypothesis and then collects lots of other data – ‘the more the merrier’ according to Turner. Then the researcher runs the study and analyses the data. If there is enough data, the researcher can try alternative methods and can play with statistics. ‘You can torture the data and it will confess to anything’ noted Turner. At some point the p-value will come out below 0.05. Only then does the research get written up.

Turner noted that he was talking about the kind of research where the work is trying to confirm a hypothesis (like clinical trials). This is different to hypothesis-generating research.

In the US clinical trials with human participants must be registered with the Federal Drug Agency (FDA) so it is possible to see the results of all trials. Turner talked about his 2008 study looking at antidepressant trials, where the journal version of the results supported the general view that antidepressants always beat placebo. However when they looked at the FDA version of all of the studies of the same drugs it happened that half of the studies were positive and half and half were not positive. The published record does not reflect the reality.

The majority of the negative studies were simply not published, but 11 of the papers had been ‘spun’ from negative to positive. These papers had a median impact factor of 5 and median citations of 68 – these were highly influential articles. As Turner noted ‘HARKing is deceptively easy’.

This perspective is supported by the finding that a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. Indeed Munafo noted that over 90% of the psychological literature finds what it purports to set out to do. Either the research being undertaken is extraordinarily mundane, or something is wrong.

Increase in retractions

So what happens when it is discovered that something that is published is incorrect? Journals do have a system which allows for the retraction of papers, and this is a practice which has been increasing over the past few years. Research looking at why the number of retractions have increased found that it was partly due to lower barriers to publication of flawed articles. In addition papers are now being retracted for issues like plagiarism and retractions are now happening more quickly.

Retraction Watch is a service which tracks retractions ‘as a window into the scientific process’. It is enlightening reading with several stories published every day.

An analysis of correction rates in the chemical literature found that the correction rate averaged about 1.4 percent for the journals examined. While there were numerous types of corrections, chemical structures, omission of relevant references, and data errors were some of the most frequent types of published corrections. Corrections are not the same as retractions, but they are significant.

There is some evidence to show that the higher the impact factor of the journal a work is published in, the higher the chance it will be retracted. A 2011 study showed a direct correlation between impact factor and the number of retractions, with New England Journal of Medicine topping the list. This situation has led to claims that the top ranking journals publish the least reliable science.

A study conducted earlier this year demonstrated that there are no commonly agreed definitions of academic integrity and malpractice. (I should note that amongst other findings the study found 17.9% (± 6.1%) of respondents reported having fabricated research data. This is almost 1 in 5 researchers. However there have been some strong criticisms of the methodology.)

There are questions about how retractions should be managed. In the print era it was not unheard of for library staff to put stickers into printed journals notifying a retraction. But in the ‘electronic age’ asked one author in 2002 when the record can be erased, is this the right thing to do because erasing the article entirely is amending history. The Committee on Publication Ethics (COPE) do have some guidelines for managing retractions which suggest the retraction be linked to the retracted article wherever possible.

However, from a reader’s perspective, even if an article is retracted this might not be obvious. In 2003* a survey of 43 online journals found 17 had no links between the original articles and later corrections. When present, hyperlinks between articles and errata showed patterns in presentation style, but lacked consistency. There are some good examples – such as Science Citation Index but there was a lack of indexing in INSPEC, and a lack of retrieval with SciFinder Scholar.

[*Note this originally said 2013, amended 2 September 2016]

Conclusion

All of this paints a pretty bleak picture. In some disciplines the pressure to publish novel results in high impact journals results in the academic record being ‘selectively curated’ at best. At worst it results in deliberate manipulation of results. And if mistakes are picked up there is no guarantee that this will be made obvious to the reader.

This all stems from the need to publish novel results in high impact journals for career progression. And when those high impact journals can be shown to be publishing a significant amount of subsequently debunked work, then the value of them as a goal for publication comes into serious question.

The next instalment in this series will look at gatekeeping in research – peer review.

Published 14 July 2016
Written by Dr Danny Kingsley

The case for Open Research: the authorship problem

July 12, 2016Uncategorizedacademia, authorship, contributions, esteeem, experiment, fraud, hyperauthorship, rewardOffice of Scholarly Communication

This is the second in a blog series about why we need to move towards Open Research. The first post about the mis-measurement problem considered issues with assessment. We now turn our attention to problems with authorship. Note that as before this is a topic of research in itself – and there is a rich vein of literature to be mined here for the interested observer.

Hyperauthorship

In May last year a high energy physics paper was published with over 5,000 authors. Of the 33 pages in this article, the paper occupied nine with the remainder listing the authors. This paper caused something of a storm of protest about ‘hyperauthorship’ (a term coined in 2001 by Blaise Cronin).

Nature published a news story on it, which was followed a week later by similar stories decrying the problem. The Independent published a story with the angle that many people are just coasting along without contributing. The Conversation’s take on the story looked at the challenge of effectively rewarding researchers. The Times Higher Education was a bit slower off the mark, in August publishing a story questioning whether mass authorship was destroying the credibility of papers.

This paper was featured in a keynote talk given at this year’s FORCE2016 conference. Associate Professor Cassidy Sugimoto from the School of Informatics and Computing, Indiana University Bloomington spoke about ‘Structural Disruptions in the Reward System of Science’ (video here). She noted that authorship is the coin of the realm the pivot point of the whole of the scientific system and this has resulted in the growth of authors listed on a paper.

Sugimoto asked: What does ‘authorship’ mean when there are more authors than words in a document? This type of mass authorship raises concerns about fraud and attribution. Who is responsible if something goes wrong?

The authorship ‘proxy for credit’ problem

Of course not all of those 5,000 people actually contributed to the writing of the article – the activity we would normally associate with the word ‘authorship’. Scientific authorship does not follow the logic of literary authorship because of the nature of what is being written about.

In 1998 Biagioli (who has literally written the book on Scientific Authorship or at least edited it) in a paper called ‘The Instability of Authorship: Credit and Responsibility in Contemporary Biomedicine’ said that “the kind of credit held by a scientific author cannot be exchanged for money because nature (or claims about it) cannot be a form of private property, but belongs in the public domain”.

Facts cannot be copyrighted. The inability to write for direct financial remuneration in academia has implications for responsibility (addressed further down), but first let’s look at the issue of academic credit.

When we say ‘author’ what do we mean in this context? Often people are named as ‘authors’ on a paper because their inclusion will help to have the paper accepted, or it is a token thanks for providing the grant funding for the work. These are practices referred to as ‘gift authorship‘ where co-authorship awarded to a person who has not contributed significantly to the study.

In an attempt to stop some of the more questionable practices above, the International Committee of Medical Journal Editors (ICMJE) has defined what it means to be an author which says authorship should be based on:

a substantial contribution
drafting the work
giving final approval and
agreeing to be accountable for the integrity of the work.

The problem, as we keep seeing, is that authorship on a publication is the only thing that counts for reward. This means that ‘authorship’ is used as a proxy for crediting people’s contribution to the study.

Identifying contributions

Listing all of the people who had something to do with a research project as ‘authors’ on the final publication fails to credit different aspects of the labour involved in the research. In an attempt to address this, PLOS asks for the different contributions by those named on a paper to be defined on articles, with their guidelines suggesting categories such as Data Curation, Methodology, Software, Formal Analysis and Supervision (amongst many).

Sugimoto has conducted some research to find what this reveals about what people are contributing to scientific labour. In an analysis of PLOS data on contributorship, her team showed that in most disciplines the labour was distributed. This means that often the person doing the experiment is not the person who is writing up the work. (I should note that I was rather taken aback by this when it arose in interviews I conducted for my PhD).

It is not particularly surprising that in the Arts, Humanities and Social Sciences that the listed ‘author’ is most often the person who wrote the paper. However in Clinical Medicine, Biomedicine or Biology very few authors are associated with the task of writing. (As an aside, the analysis found women are disproportionately likely to be doing the experimentation, and men are more likely to be authoring, conceiving experimentation or obtaining resources.)

So, would it not be better if rather than placing the only emphasis on authorship of journal articles in high impact journals, we were able to reward people for different contributions to the research?

And while everyone takes credit, not all people take responsibility.

Authorship – taking responsibility

It is not just the issue of the inability to copyright ‘facts of nature’ that makes copyright unusual in academia. The academic reward system works on the ‘academic gift principle’ – academics provide the writing, the editing and the peer review for free and do not expect payment. The ‘reward’ is academic esteem.

This arrangement can seem very odd to an outsider who is used to the idea of work for hire. But there are broader implications than what is perceived to be ‘fair’ – and these relate to accountability. It is much more difficult to sue a researcher for making incorrect statements than it is to sue a person who writes for money (like a journalist).

Let us take a short meander into the world of academic fraud. Possibly the biggest and certainly highly contentious case was Andrew Wakefield and the discredited (and retracted) claim that the MMR vaccine was associated with autism in children. This has been discussed at great length elsewhere – the latest study debunking the claim was published last year. Partly because of the way science is credited and copyright is handled, there were minimal repercussions for Wakefield. He is barred from practicing medicine in the UK, but enjoys a career on the talkback circuit in the US. Recently a film about the MMR claims, directed by Wakefield was briefly shown at the Tribeca film festival before protests saw it removed from the programme.

Another high profile case is Diedderik Stapel, a Dutch social psychologist who entirely fabricated his data over many years. Despite several doctoral students’ work being based on this data and over 70 articles having to be retracted there were no charges laid. The only consequence he faced was having his professorship stripped.

Sometimes the consequences of fraud are tragic. A Japanese stem cell researcher, Haruko Obokata, who fabricated her results had her PhD stripped from her. There were no criminal charges laid but her supervisor committed suicide and the funding for the centre she was working in was cut. The work had been published in Nature which then retracted the work and wrote some editorial about the situation.

The question of scientific accountability is so urgent that there was a call last year to criminalise scientific misconduct in this paper. Indeed things do seem to be changing slowly and there have been some high profile cases where scientific fraud has resulted in criminal charges being laid. A former University of Queensland academic is currently facing fraud related charges over his fabricated results from a study into Parkinson’s disease and multiple sclerosis. This time last year, Dong-Pyou Han, a former biomedical scientist at Iowa State University in Ames, was sentenced to 57 months for fabricating and falsifying data in HIV vaccine trials. Han has also been fined US$7.2 million. In both the cases the issue is the misuse of grant funding rather than publication of false results.

The combination of great ‘reward’ from publication in high profile journals and little repercussion (other than having that ‘esteem’ taken away) has proven to be too great a temptation for some.

Conclusion

The need to publish in high impact journals has caused serious authorship issues – resulting in huge numbers of authors on some papers because it is the only way to allocate credit. And there is very little in the way we reward researchers that adequately allows for calling researchers to take responsibility when something goes wrong, in some cases resulting in serious fraud.

The next instalment in this series will look at ‘reproducibility, retractions and retrospective hypotheses.

Published 12 July 2016
Written by Dr Danny Kingsley

The case for Open Research: the mis-measurement problem

July 11, 2016UncategorizedOffice of Scholarly Communication

Let’s face it. The biggest blockage we have to widespread Open Access is not researcher apathy, a lack of interoperable systems, or an unwillingness of publishers to engage (although these do each play some part) – it is the problem that the only thing that counts in academia is publication in a high impact journal.

This situation is causing multiple problems, from huge numbers of authors on papers, researchers cherry picking results and retrospectively applying hypotheses, to the reproducibility crisis and a surge in retractions.

This blog was intended to be an exploration of some solutions prefaced by a short overview of the issues. Rather depressingly, there was so much material the blog has had to be split up, with several parts describing the problem(s) before getting to the solutions.

Prepare yourself, this will be a bumpy ride. This first instalment looks at the reward system. The second instalment will consider authorship and credit. The third will look at reproducibility, retractions and retrospective hypotheses. The fourth asks if peer review is working. And the final blog will discuss some options for solving at least part of the problem.

I should note that this is not a comprehensive literature review. Every subheading of this blog series is a topic of considerable research on its own and there are many further examples available to the interested reader. I welcome debate, suggestions and links in the comments section of the blog(s).

Measurement for reward

The Journal Impact Factor

Let’s start with how researchers are measured. For decades academia has lived with the ‘Publish or Perish’ mantra which has spawned problems with poor publication practices. Today the pressure to be published in a high impact journal is stronger than ever.

A journal’s Impact Factor (JIF) averages the number of citations received by a journal in a given year divided by the number of articles published in the previous two years. For example, a journal’s JIF for a given year is calculated by taking the number of citations made that year to the articles published in the journal in the previous years and then dividing by the total number of articles (including reviews and other non-scholarly content) published in that journal in those years.

The JIF is compiled by Journal Citation Reports – which is owned by a commercial company Thompson Reuters. The company announced its sale for $3.5 billion today.

This blog will not dig in any depth into the issues with the way the JIF is calculated, although there are some serious ones (see a 2006 paper I coauthored on this topic). Neither will it explore the problem of how much the JIF is gamed – from self-citations to journals insisting on a certain number of citations to publications within the same journal. Sufficient to say that each year a number of journals are removed from the index due to this type of behaviour. The record to date was in 2013, a year which saw 66 journals struck from the list. By comparison only 18 were suppressed in the most recent report.

There have been many, many criticisms of the Journal Impact Factor and its effects on scholarship. But the criticisms put forward a decade ago to the month by PLOS still ring true. One of the issues, PLOS argued, was that because Thompson Reuters does not make public the process for choosing ‘citable’ article types, this means “science is currently rated by a process that is itself unscientific, subjective, and secretive”.

Indeed last week a news article in Science and a related news article in Nature put forward exactly the same criticism. The stories referred to a paper: “A simple proposal for the publication of journal citation distributions” posted on BioRxiv. This described some comparative research undertaken to look at whether a reanalysis of the data would provide the same results as Thompson Reuters. It didn’t. The work found the citation distributions were “so skewed that up to 75% of the articles in any given journal had lower citation counts than the journal’s average number”. The authors likened using the JIF to determine the impact of a given article to ‘guesswork‘.

Jon Tennant, in a 2015 blog stated that “The impact factor is one of the most mis-used metrics in the history of academia” and proposed an Open Letter template for researchers to “send to people in positions of power at different institutions, co-signed by as many academics as possible who believe in fairer and evidence-based assessment”. Tennant in turn references Stephen Curry’s 2012 blog which opened with the statement “The impact factor might have started out as a good idea, but its time has come and gone”.

There are many more, but I am sure you get the idea.

This is recognised as such a big problem that in 2012 the San Francisco Declaration on Research Assessment (DORA) was conceived with the intent to: ‘Put science into the assessment of research’. Over 12,000 individuals and over 700 organisations have signed the declaration to date supporting the call for a “need to assess research on its own merits rather than on the basis of the journal in which the research is published”.

If nothing else, there is clearly a problem with measuring the worth of something by considering the packaging and not the item itself. But the academy continues to use the JIF and criticisms continue to come thick and fast.

Clearly something is rotten in the state of Denmark.

Ditching the JIF

In Stephen Jay Gould’s seminal book The Mismeasure of Man where he debunks the science behind biological determinism, he criticises “the myth that science itself is an objective enterprise, done properly only when scientists can shuck the constraints of their culture and view the world as it really is”. This observation is true of any metrics we apply to the valuing of research outputs. They are not objective, and not an accurate view. Any measurement tool causes its own problems.

An example of a non-JIF type of measurement is the increased emphasis on ‘excellence’ by funders and governments (the Research Excellence Framework in the UK and Excellence in Research for Australia being two examples). But ‘excellence rhetoric’ is counterproductive to good research, according to one argument which concludes that ‘excellence’ is a “pernicious and dangerous rhetoric that undermines the very foundations of good research and scholarship”.

The insistence on excellence, it can be argued, have spawned problems with reproducibility and fraud. In other words, the same problems that the JIF has caused.

There have been many other suggestions for ways to measure researchers, such as the h-index which has its own set of issues, and the Eigenfactor Score – these are only two of a myriad of options. But as the system changes, so does researcher behaviour. A clear example was in Australia when the funding mechanism moved to a simple count of research papers rather than any assessment of the value of those papers. This resulted in a marked increase in the number of papers being produced and a concurrent decrease in the overall quality as described in ‘Modifying publication practices in response to funding formulas‘.

Clifford Lynch, the Executive Director of CNI noted in his welcome talk at the JISC-CNI event held at Wadham College, Oxford last week that using alternative metrics means we start running into issues about vendor lock-in and data confidentiality.

While alternate metrics might solve the ‘valuing the article rather than journal’ issue, they bring up problems of their own. In HEFCE’s 2015 report on metrics being used in assessment in the future noted that some indicators can be misused or ‘gamed’ – with journal impact factors, university rankings and citation counts put forward as three prominent examples. The report recommended that metrics should be updated in response to their potential effects. In deciding what metrics to use, the report recommended using the best possible data in terms of accuracy and scope, and that the data collection and analytical processes should be open and transparent to allow verification. It also suggested using a range of indicators.

Financial implications

What does this emphasis on particular publication outlets have to do with Open Access? Well a great deal as it happens. It is the big blocker to widespread change. As long as we continue with this emphasis we will not get any real traction with Open Access because it locks us into an old print paradigm of academia.

Much ink has been spilt over the cost of publication and the added cost of open access (some of it mine) which includes not just the cost of the article processing charges but the burden of administering multiple micropayments.

As I have said on numerous occasions (see here and here) funders paying for hybrid open access is expensive and has not resulted in journals flipping to gold (as a transition to fully Open Access environment) despite this being a stated aim of the process. It makes sense from a publisher’s perspective not to flip journals – why, when researchers are under pressure to publish in high impact journals, and there is a new revenue stream associated with that publishing, would you kill the proverbial goose?

Indeed, a paper earlier this year argued that “Open Access has the potential to become unsustainable for research communities if high-cost options are allowed to continue to prevail in a widely unregulated scholarly publishing market.”

The problem, it can be argued, is that the infrastructure underpinning open access is ‘path dependent’ a concept proposed in 1985 which explains how the set of decisions in the present is limited by the decisions one has made in the past, even though the contextual factors shaping the past decision no longer apply. Scholarly publishing is path-dependent, some authors argue “because it still heavily depends on a few players that occupy crucial nodes in the scientific information infrastructure. In the past, these players were scientific associations, but now these players are commercial publishing companies”.

As long as the current reward system remains, the crucial nodes will not change and we are stuck.

Conclusion

So that covers some of the problems with the way we measure our researchers, and some of the financial implications of this. The next blog in this series will cover some of the issues with authorship.

Published 11 July 2016
Written by Dr Danny Kingsley

Unlocking Research

Open Research at Cambridge

Category Archives: Uncategorized

The case for Open Research: reproducibility, retractions & retrospective hypotheses

The reproducibility crisis

HARKing problem

Increase in retractions

Conclusion

The case for Open Research: the authorship problem

Hyperauthorship

The authorship ‘proxy for credit’ problem

Identifying contributions

Authorship – taking responsibility

Conclusion

The case for Open Research: the mis-measurement problem

Measurement for reward

The Journal Impact Factor

Ditching the JIF

Financial implications

Conclusion