Tag Archives: reproducibility

Beyond compliance – dialogue on barriers to data sharing

Welcome to International Data Week. The Office of Scholarly Communication is celebrating with a series of blog posts about data, starting with a summary of an event we held in July.

JME_0629.jpgOn 29 July 2016 the Cambridge Research Data Team joined forces with the Science and Engineering South Consortium to organise a one day conference at the Murray Edwards College to gather researchers and practitioners for a discussion about the existing barriers to data sharing. The whole aim of the event was to move beyond compliance with funders’ policies. We hoped that the community was ready to change the focus of data sharing discussions from whether it is worth sharing or not towards more mature discussions about the benefits and limitations of data sharing.

What are the barriers?

So what are the barriers to effective sharing of research data? There were three main barriers identified, all somewhat related to each other: poorly described data, insufficient data discoverability and difficulties with sharing personal/sensitive data. All of these problems arise from the fact that research data does not always shared in accordance to FAIR principles: that data is Findable, Accessible, Interoperable and Re-usable.

Poorly described data

The event started with an inspiring keynote talk from Dr Nicole Janz from the Department of Sociology at the University of Cambridge: “Transparency in Social Science Research & Teaching”. Nicole regularly runs replication workshops at Cambridge, where students select published research papers and they work hard for several weeks to reproduce the published findings. The purpose of these workshop is to allow students to learn by experience on what is important in making their own work transparent and reproducible to others.

Very often students fail to reproduce the results. Frequently, the reasons for failures are insufficient methodology available, or simply the fact that key datasets were not made available. Students learn that in order to make research reproducible, one not only needs to make the raw data files available, but that the data needs to be shared with the source code used to transform it and with written down methodology of the process, ideally in a README file. While doing replication studies, students also learn about the five selfish benefits of good data management and sharing: data disasters are avoided, it is easier to write up papers from well-managed data, transparent approach to sharing makes the work more convincing to reviewers, the continuity of research is possible and researchers can build their reputation for being transparent. As a tip for researchers, Nicole suggested always asking a colleague to try to reproduce the findings before submitting a paper for peer-review.

The problem of insufficient data description/availability was also discussed during the first case study talk by Dr Kai Ruggeri from the Department of Psychology, University of Cambridge. Kai reflected on his work on the assessment of happiness and wellbeing across many European countries, which was part of the ESRC Secondary Data Analysis Initiative. Kai re-iterated that missing data make the analysis complicated and sometimes prevent one from being able to make effective policy recommendations. Kai also stressed that frequently the choice of baseline for data analysis can affect the final results. Therefore, proper description of methodology and approaches taken is key for making research reproducible.

Insufficient data discoverability

JME_0665We also heard several speakers describing problems with data discoverability. Fiona Nielsen founded Repositive – a platform for finding human genomic data. Fiona founded the platform out of frustration that genomic data was so difficult to find and access. Proliferation of data repositories made it very hard for researchers to actually find what they need.

IMG_SearchingForData_20160911Fiona started with doing a quick poll among the audience: how do researchers look for data? It turned out that most researchers find data by doing a literature research or by googling for it. This is not surprising – there is no search engine enabling looking for information simultaneously across the multiple repositories where the data is available. To make it even more complicated, Fiona reported that in 2015 80PB of human genomic data was generated. Unfortunately, only 0.5PB of human genomic data was made available in a data repository.

So how can researchers find the other datasets, which are not made available in public repositories? Repositive is a platform harvesting metadata from several repositories hosting human genomic data and providing a search engine allowing researchers to simultaneously look for datasets shared in all of them. Additionally, researchers who cannot share their research data via a public repository (for example, due to lack of participants’ consent for sharing), can at least create a metadata record about the data – to let others know that the data exist and to provide them with information on data access procedure.

The problem of data discoverability is however not only related to people’s awareness that datasets exists. Sometimes, especially in the case of complex biological data with a vast amount of variables, it can be difficult to find the right information inside the dataset. In an excellent lightening talk, Jullie Sullivan from the University of Cambridge described InterMine –platform to make biological data easily searchable (‘mineable’). Anyone can simply upload their data onto the platform to make it searchable and discoverable. One example of the platform’s use is FlyMine – database where researchers looking for results of experiments conducted on fruit fly can easily find and share information.

Difficulties with sharing personal/sensitive data

The last barrier to sharing that we discussed was related to sharing personal/sensitive research data. This barrier is perhaps the most difficult one to overcome, but here again the conference participants came up with some excellent solutions. First one came from the keynote speech by Louise Corti – with a talk with a very uplifting title: “Personal not painful: Practical and Motivating Experiences in Data Sharing”.

Louise based her talk on the long experience of the UK Data Service with providing managed access to data containing some forms of confidential/restricted information. Apart from being able to host datasets which can be made openly available, the UKDS can also provide two other types of access: safeguarded access, where data requestors need to register before downloading the data, and controlled data, where requests for data are considered on a case by case basis.

At the outset of the research project, researchers discuss their research proposals with the UKDS, including any potential limitations to data sharing. It is at this stage – at the outset of the research project, that the decision is made on the type of access that will be required for the data to be successfully shared. All processes of project management and data handling, such as data anonymisation and collection of informed consent forms from study participants, are then carried in adherence to that decision. The UKDS also offers protocols clarifying what is going to happen with research data once they are deposited with the repository. The use of standard licences for sharing make the governance of data access much more transparent and easy to understand, both from the perspective of data depositors and data re-users.

Louise stressed that transparency and willingness to discuss problems is key for mutual respect and understanding between data producers, data re-users and data curators. Sometimes unnecessary misunderstandings make data sharing difficult, when it does not need to be. Louise mentioned that researchers often confuse ‘sensitive topic’ with ‘sensitive data’ and referred to a success case study where, by working directly with researchers, UKDS managed to share a dataset about sedation at the end of life. The subject of study was sensitive, but because the data was collected and managed with the view of sharing at the end of the project, the dataset itself was not sensitive and was suitable for sharing.

As Louise said “data sharing relies on trust that data curators will treat it ethically and with respect” and open communication is key to build and maintain this trust.

So did it work?

JME_0698The purpose of this event was to engage the community in discussions about the existing limitation to data sharing. Did we succeed? Did we manage to engage the community? Judging by the fact that we have received twenty high quality abstract applications from researchers across various disciplines for only five available case study speaking slots (it was so difficult to shortlist the top five ones!) and also because the venue was full – with around eighty attendees from Cambridge and other institutions, I think that the objective was pretty well met.

Additionally, the panel discussion was led by researchers and involved fifty eight active users on the Sli.do platform for questions to panellists. There were also questions asked outside of Sli.do platform. So overall I feel that the event was a great success and it was truly fantastic to be part of it and to see the degree of participant involvement in data sharing.

Another observation is also the great progress of the research community in Cambridge in the area of sharing: we have successfully moved away from discussions whether research data is worth sharing to how to make data sharing more FAIR.

It seems that our intense advocacy, and the effort of speaking with over 1,800 academics from across the campus since January 2015 paid off and we have indeed managed to build an engaged research data management community.

Read (and see!) more:

Published 12 September 2016
Written by Dr Marta Teperek
Creative Commons License

Could Open Research benefit Cambridge University researchers?

This blog is part of the recent series about Open Research and reports on a discussion with Cambridge researchers  held on 8 June 2016 in the Department of Engineering. Extended notes from the meeting and slides are available at the Cambridge University Research Repository. This report is written by  Lauren Cadwallader, Joanna Jasiewicz and Marta Teperek (listed alphabetically by surname).

At the Office of Scholarly Communication we have been thinking for a while about Open Research ideas and about moving beyond mere compliance with funders’ policies on Open Access and research data sharing. We thought that the time has come to ask our researchers what they thought about opening up the research process and sharing more: not only publications and research data, but also protocols, methods, source code, theses and all the other elements of research. Would they consider this beneficial?

Working together with researchers – democratic approach to problem-solving

To get an initial idea of the expectations of the research community in Cambridge, we organised an open discussion hosted at the Department of Engineering. Anyone registering was asked three questions:

  • What frustrates you about the research process as it is?
  • Could you propose a solution that could solve that problem?
  • Would you be willing to speak about your ideas publicly?

20160608_163000Interestingly, around fifty people registered to take part in the discussion and almost all of them contributed very thought-provoking problems and appealing solutions. To our surprise, half of the people expressed their will to speak publicly about their ideas. This shaped our discussion on the day.

So what do researchers think about Open Research? Not surprisingly, we started from an animated discussion about unfair reward systems in academia.

Flawed metrics

A well-worn complaint: the only thing that counts in academia is publication in a high impact journal. As a result, early career researchers have no motivation to share their data and to publish their work in open access journals, which can sometimes have lower impact factors. Additionally, metrics based on the whole journal do not reflect the importance of the research described: what is needed is article-level impact measurements. But it is difficult to solve this systemic problem because any new journal which wishes to introduce a new metrics system has no journal-level impact factor to start with, and therefore researchers do not want to publish in it.

Reproducibility crisis: where quantity, not quality, matters

Researchers also complained that the volume of produced research is higher and higher in terms of quantity and science seems to have entered an ‘era of quantity’. They raised the concern that the quantity matters more than the quality of research. Only the fast and loud research gets rewarded (because it is published in high impact factor journals), and the slow and careful seems to be valued less. Additionally, researchers are under pressure to publish and they often report what they want to see, and not what the data really shows. This approach has led to the reproducibility crisis and lack of trust among researchers.

Funders should promote and reward reproducible research

The participants had some good ideas for how to solve these problems. One of the most compelling suggestions was that perhaps funding should go not only to novel research (as it seems to be at the moment), but also to people who want to reproduce existing research. Additionally, reproducible research itself should be rewarded. Funders could offer grant renewal schemes for researchers whose research is reproducible.

Institutions should hire academics committed to open research

Another suggestion was to incentivise reward systems other than journal impact factor metrics. Someone proposed that institutions should not only teach the next generation of researchers how to do reproducible research, but also embed reproducibility of research as an employment criteria. Commitment to Open Research could be an essential requirement in job description. Applicants could be asked at the recruitment stage how they achieve the goals of Open Research. LMU University in Munich had recently included such a statement in a job description for a professor of social psychology (see the original job description here and a commentary here).

Academia feeding money to exploitative publishers

Researchers were also frustrated by exploitative publishers. The big four publishers (Elsevier, Wiley, Springer and Informa) have a typical annual profit margin of 37%. Articles are donated to the publishers for free by the academics, and reviewed by other academics, also free of charge. Additionally, noted one of the participants, academics also act as journal editors, which they also do for free.

[*A comment about this statement was made on 15 August 2017 noting that some editors do get paid. While the participant’s comment stands as a record of what was said, we acknowledge that this is not an entirely accurate statement.]

In addition to this, publishers take away the copyright from the authors. As a possible solution to the latter, someone suggested that universities should adopt institutional licences on scholarly publishing (similar to the Harvard licence) which could protect the rights of their authors

Pre-print services – the future of publishing?

Could Open Research aid the publishing crisis? Novel and more open ways of publishing can certainly add value to the process. The researchers discussed the benefits of sharing pre-print papers on platforms like arXiv and bioRxiv. These services allow people to share manuscripts before publication (or acceptance by a journal). In physics, maths and computational sciences it is common to upload manuscripts even before submitting the manuscript to a journal in order to get feedback from the community and have the chance to improve the manuscript.

bioRxiv, the life sciences equivalent of arXiv, started relatively recently. One of our researchers mentioned that he was initially worried that uploading manuscripts into bioRxiv might jeopardise his career as a young researcher. However, he then saw a pre-print manuscript describing research similar to his published on bioRxiv. He was shocked when he saw how the community helped to change that manuscript and to improve it. He has since shared a lot of his manuscripts on bioRxiv and as his colleague pointed out, this has ‘never hurt him’. To the contrary, he suggested that using pre-print services promotes one’s research: it allows the author to get the work into the community very early and to get feedback. And peers will always value good quality research and the value and recognition among colleagues will come back to the author and pay back eventually.

Additionally, someone from the audience suggested that publishing work in pre-print services provides a time-stamp for researchers and helps to ensure that ideas will not be scooped by anyone – researchers are free to share their research whenever they wish and as fast they wish.

Publishers should invest money in improving science – wishful thinking?

It was also proposed that instead of exploiting academics, publishers could play an important role in improving the research process. One participant proposed a couple of simple mechanisms that could be implemented by publishers to improve the quality of research data shared:

  • Employment of in-house data experts: bioinfomaticians or data scientists, who could judge whether supporting data is of a good enough quality
  • Ensure that there is at least one bioinfomatician/data scientist on the reviewing panel for a paper
  • Ask for the data to be deposited in a public, discipline-specific repository, which would ensure quality control of the data and adherence to data standards.
  • Ask for the source code and detailed methods to be made available as well.

Quick win: minimum requirements for making shared data useful

A requirement that, as a minimum, three key elements should be made available with publications – the raw data, the source code and the methods – seems to be a quick win solution to make research data more re-usable. Raw data is necessary as it allows users to check if the data is of a good quality overall, while publishing code is important to re-run the analysis and methods need to be detailed enough to allow other researchers to understand all the processes involved in data processing. An excellent case study example comes from Daniel MacArthur who has described how to reproduce all the figures in his paper and has shared the supporting code as well.

It was also suggested that the Office of Scholarly Communication could implement some simple quality control measures to ensure that research data supporting publications is shared. As a minimum the Office could check the following:

  • Is there a data statement in the publication?
  • If there is a statement – is there a link to the data?
  • Does the link work?

This is definitely a very useful suggestion from our research community and in fact we have already taken this feedback aboard and started checking for data citations in Cambridge publications.

Shortage of skills: effective data sharing is not easy

The discussion about the importance of data sharing led to reflections that effective data sharing is not always easy. A bioinformatician complained that datasets that she had tried to re-use did not satisfy the criteria of reproducibility, nor re-usability. Most of the time there was not enough metadata available to successfully use the data. There is some data shared, there is the publication, but the description is insufficient to understand the whole research process: the miracle, or the big discovery, happens somewhere in the middle.

Open Research in practice: training required

Attendees agreed that it requires effort and skills to make research open, re-usable and discoverable by others. More training is needed to ensure that researchers are equipped with skills to allow them to properly use the internet to disseminate their research, as well as with skills allowing them to effectively manage their research data. It is clear that discipline-specific training and guidance around how to manage research data effectively and how to practise open research is desired by Cambridge researchers.

Nudging researchers towards better data management practice

Many researchers have heard or experienced first-hand horror stories of having to follow up on somebody else’s project, where it was not possible to make any sense of the research data due to lack of documentation and processes. This leads to a lot of time wasted in every research group. Research data need to be properly documented and maintained to ensure research integrity and research continuity. One easy solution is to nudge researchers towards better research data management practice could be formalised data management requirements. Perhaps as a minimum, every researchers should have a lab book to document research procedures.

The time is now: stop hypocrisy

Finally, there was a suggestion that everyone should take the lead in encouraging Open Research. The simplest way to start is to stop being what has been described as a hypocrite and submit articles to journals which are fully Open Access. This should be accompanied by making one’s reviews openly available whenever possible. All publications should be accompanied by supporting research data and researchers should ensure that they evaluate individual research papers and that their judgement is not biased by the impact factor of the journal.

Need for greater awareness and interest in publishing

One of the Open Access advocates present at the meeting stated that most researchers are completely unaware of who are the exploitative and ethical publishers and the differences between them. Researchers typically do not directly pay the exploitative publishers and are therefore not interested in looking at the bigger picture of sustainability of scholarly publishing. This is clearly an area when more training and advocacy can help and the Office of Scholarly Communication is actively involved in raising awareness in Open Access. However, while it is nice to preach in a room of converts, how do we get other researchers involved in Open Access? How should we reach out to those who can’t be bothered to come to a discussion like the one we had? This is the area where anyone who understands the benefits Open Access has a job to do.

Next steps

We are extremely grateful to everyone who came to the event and shared their frustrations and ideas on how to solve some problems. We noted all the ideas on post it notes – the number of notes at the end of the discussion was impressive, an indication of how creative the participants were in just 90 minutes. It was a very productive meeting and we wish to thank all the participants for their time and effort.

20160608_160721

We think that by acting collaboratively and supporting good ideas we can achieve a lot. As an inspiration, McGill University’s Montreal Neurological Institute and Hospital (the Neuro) in Canada have recently adopted a policy on Open Research: over the next five years all results, publications and data will be free to access by everyone.

Follow up

If you would like to host similar discussions directly in your departments/institutes, please get in touch with us at info@osc.cam.ac.uk – we would be delighted to come over and hear from researchers in your discipline.

In the meantime, if you have any additional ideas that you wish to contribute, please send them to us. Everyone who is interested in being informed about the progress here is encouraged to sign up for a mailing distribution list here.

Extended notes from the meeting and slides are available at the Cambridge University Research Repository. We are particularly grateful to Avazeh Ghanbarian, Corina Logan, Ralitsa Madsen, Jenny Molloy, Ross Mounce and Alasdair Russell (listed alphabetically by surname) for agreeing to publicly speak at the event.

Published 3 August 2016
Written by Lauren Cadwallader, Joanna Jasiewicz and Marta Teperek
Creative Commons License

The case for Open Research: reproducibility, retractions & retrospective hypotheses

This is the third instalment of ‘The case for Open Research’ series of blogs exploring the problems with Scholarly Communication caused by having a single value point in research – publication in a high impact journal. The first post explored the mis-measurement of researchers and the second looked at issues with authorship.

This blog will explore the accuracy of the research record, including the ability (or otherwise) to reproduce research that has been published, what happens if research is retracted, and a concerning trend towards altering hypotheses in light of the data that is produced.

Science is thought to progress  through the building of knowledge through questioning, testing and checking work. The idea of ‘standing on the shoulders of giants’ summarises this – we discover truth by building on previous discoveries. But scientists are very rarely rewarded for being right, they are rewarded for publishing in certain journals and for getting grants. This can result in distortion of the science.

How does this manifest? The Nine Circles of Scientific Hell describes questionable research practices that occur, ranging from Overselling, Post-Hoc storytelling, p-value Fishing, Creative use of Outliers to Non or Partial Publication of Data. We will explore some of these below. (Note this article appears in a special issue of Perspectives on Psychological Science on the Replicability in Psychological Science – which contains many other interesting articles).

Much as we like to think of science as an objective activity it is not. Scientists are supposed to be impartial observers, but in reality they need to get grants, and publish papers to get promoted to more ‘glamorous institutions’. This was the observation of Professor Marcus Munafo in his presentation ‘Scientific Ecosystems and Research Reproducibility’ at the Research Libraries UK conference held earlier this year (the link will take you to videos of the presentations). Monafo observed that scientists are rarely rewarded for being right, so the scientific record is being distorted by the scientific ecosystem.

Monafo, a Biological Psychologist at Bristol University, noted that research, particularly in the biomedical sciences, ‘might not be as robust as we might have hoped‘.

The reproducibility crisis

A recent survey of over 1500 scientists by Nature tried to answer the question “Is there a reproducibility crisis?” The answer is yes, but whether that matters appears to be debatable: “Although 52% of those surveyed agree that there is a significant ‘crisis’ of reproducibility, less than 31% think that failure to reproduce published results means that the result is probably wrong, and most say that they still trust the published literature.”

There are certainly plenty of examples of the inability to reproduce findings. Pharmaceutical research can be fraught. Some research into potential drug targets found that in almost two-thirds of the projects looked at, there were inconsistencies between published data and the data resulting from attempts to reproduce the findings. 

There are implications for medical research as well. A study published last month looked at functional MRI (fMRI), noting that when analysing data using different experimental designs they should in theory find a significance threshold of 5% (a p-value of less than 0.05  which is conventionally described as statistically significant). However they found “the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.”

A 2013 survey of cancer researchers found that approximately half of respondents had experienced at least one episode of the inability to reproduce published data. Of those people who followed this up with the original authors, most were unable to determine why the work was not reproducible. Some of those original authors were (politely) described as ‘less than “collegial”’.

So what factors are at play here? Partly it is due to the personal investment in a particular field. A 2012 study of authors of significant medical studies concluded that: “Researchers are influenced by their own investment in the field, when interpreting a meta-analysis that includes their own study. Authors who published significant results are more likely to believe that a strong association exists compared with methodologists.”

This was also a factor in a study Why Most Published Research Findings Are False that considered the way research studies are constructed. This work found that “for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias.”

Psychology is a discipline where there is a strong emphasis on novelty, discovery and finding something that has a p-value of less than 0.05. There is such an issue with reproducibility in psychology that there are large efforts to try and reproduce psychological studies to estimate the reproducibility of the research. The Association for Psychological Science has launched a new article type of Registered Replication Reports which consists of “multi-lab, high-quality replications of important experiments in psychological science along with comments by the authors of the original studies”.

This is a good initiative, although there might be some resistance to this type of scrutiny. Something that was interesting from the Nature survey on reproducibility was the question of what happened when researchers attempted to publish a replication study. Note that only a few of respondents had done this, possibly because incentives to publish positive replications are low and journals can be reluctant to publish negative findings. The study found that “several respondents who had published a failed replication said that editors and reviewers demanded that they play down comparisons with the original study”.

What is causing this distortion of the research? It is the emphasis on publication of novel results in high impact journals. There is no reward for publishing null results or negative findings.

HARKing problem

The p-value came up again in a discussion about HARKing at this year’s FORCE2016 conference (HARK stands for Hypothesising After the Results are Known – a term coined in 1998).

In his presentation at FORCE2016 Eric Turner, Associate Professor OHSU, spoke about HARKing (see this video 37 minutes onward).  The process is that the researcher conceives the study, writes the protocol up for their eyes only, with a hypothesis and then collects lots of other data – ‘the more the merrier’ according to Turner. Then the researcher runs the study and analyses the data. If there is enough data, the researcher can try alternative methods and can play with statistics. ‘You can torture the data and it will confess to anything’ noted Turner. At some point the p-value will come out below 0.05. Only then does the research get written up.

Turner noted that he was talking about the kind of research where the work is trying to confirm a hypothesis (like clinical trials). This is different to hypothesis-generating research.

In the US clinical trials with human participants must be registered with the Federal Drug Agency (FDA) so it is possible to see the results of all trials. Turner talked about his 2008 study looking at antidepressant trials, where the journal version of the results supported the general view that antidepressants always beat placebo.  However when they looked at the FDA version of all of the studies of the same drugs it happened that half of the studies were positive and half and half were not positive. The published record does not reflect the reality.

The majority of the negative studies were simply not published, but 11 of the papers had been ‘spun’ from negative to positive. These papers had a median impact factor of 5 and median citations of 68 – these were highly influential articles. As Turner noted ‘HARKing is deceptively easy’.

This perspective is supported by the finding that a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. Indeed Munafo noted that over 90% of the psychological literature finds what it purports to set out to do. Either the research being undertaken is extraordinarily mundane, or something is wrong.

Increase in retractions

So what happens when it is discovered that something that is published is incorrect? Journals do have a system which allows for the retraction of papers, and this is a practice which has been increasing over the past few years.  Research looking at why the number of retractions have increased found that it was partly due to lower barriers to publication of flawed articles. In addition papers are now being retracted for issues like plagiarism and retractions are now happening more quickly.

Retraction Watch is a service which tracks retractions ‘as a window into the scientific process’. It is enlightening reading with several stories published every day.

An analysis of correction rates in the chemical literature found that the correction rate averaged about 1.4 percent for the journals examined. While there were numerous types of corrections, chemical structures, omission of relevant references, and data errors were some of the most frequent types of published corrections. Corrections are not the same as retractions, but they are significant.

There is some evidence to show that the higher the impact factor of the journal a work is published in, the higher the chance it will be retracted. A 2011 study showed a direct correlation between impact factor and the number of retractions, with New England Journal of Medicine topping the list. This situation has led to claims that the top ranking journals publish the least reliable science.

A study conducted earlier this year demonstrated that there are no commonly agreed definitions of academic integrity and malpractice. (I should note that amongst other findings the study found 17.9% (± 6.1%) of respondents reported having fabricated research data. This is almost 1 in 5 researchers. However there have been some strong criticisms of the methodology.)

There are questions about how retractions should be managed. In the print era it was not unheard of for library staff to put stickers into printed journals notifying a retraction. But in the ‘electronic age’ asked one author in 2002 when the record can be erased, is this the right thing to do because erasing the article entirely is amending history.  The Committee on Publication Ethics (COPE) do have some guidelines for managing retractions which suggest the retraction be linked to the retracted article wherever possible.

However, from a reader’s perspective, even if an article is retracted this might not be obvious. In 2003* a survey of 43 online journals found 17 had no links between the original articles and later corrections. When present, hyperlinks between articles and errata showed patterns in presentation style, but lacked consistency. There are some good examples – such as Science Citation Index but there was a lack of indexing in INSPEC, and a lack of retrieval with SciFinder Scholar.

[*Note this originally said 2013, amended 2 September 2016]

Conclusion

All of this paints a pretty bleak picture. In some disciplines the pressure to publish novel results in high impact journals results in the academic record being ‘selectively curated’ at best. At worst it results in deliberate manipulation of results. And if mistakes are picked up there is no guarantee that this will be made obvious to the reader.

This all stems from the need to publish novel results in high impact journals for career progression. And when those high impact journals can be shown to be publishing a significant amount of subsequently debunked work, then the value of them as a goal for publication comes into serious question.

The next instalment in this series will look at gatekeeping in research – peer review.

Published 14 July 2016
Written by Dr Danny Kingsley
Creative Commons License