Tag Archives: journals

Data Diversity Podcast #3 – Dr Nick H. Wise (1/4)

In our third instalment of the Data Diversity Podcast, we are joined by Dr Nick H. Wise, Research Associate in Architectural Fluid Mechanics at the Department of Engineering, University of Cambridge. As is the theme of the podcast, we spoke to Nick about his experience as a researcher, but this is a special edition of the podcast. Besides being a scientist and an engineer, Nick has made his name as a scientific sleuth who, based on an article on the blog Retraction Watch which was written in 2022, is responsible for more than 850 retractions, leading Times Higher Education to dub him as a research fraudbuster. Since then, through his X account @Nickwizzo, he has continued his investigations, tracking cases of fraud and in some cases, naming and shaming the charlatans. Nick was kind to share with us many great insights over a 90-minute conversation, and as such we have decided to release a four part-series dedicated to the topic of research integrity. 

In this four-part series, we will learn from Nick about some of the shady activities that taint the scientific publishing industry today. In part one, we learn how Nick was introduced into the world of publication fraud and how that led him to investigate the industry behind it. Below are some excerpts from the conversation, which can be listened to in full here


I have found evidence of a papermill bribing some editors and there have been many, at least tens, if not hundreds, of editors that have been let go or told to stop being editors by journals in the last year because they have been found to be compromised. This could be because of bribery or some other way of being compromised. This is what I try to uncover. – Dr Nick H. Wise


Tortured Phrases and PubPeer: Nick’s beginnings as a Scientific Sleuth  

My background is in fluid dynamics where I mostly think about fluid dynamics within buildings. For instance, I think about the air flows generated by different heating systems and things like pollutant transport such as smells or COVID which can travel with the air and interact with other each other. That was my PhD and the post-doc in the Engineering department.

About three years ago whilst trying to avoid writing my thesis, I saw a tweet from the great Elizabeth Bik, who is possibly the most famous research fraud investigator. She mostly looks at biomedical images and her great skill is she would be able to look through a paper and see photos of Western blots of microscopy slides and see if parts of an image are identical to other parts, or if the image overlaps with images from different papers. She has an incredible memory and ability to spot these images. She’s been doing this for over 10 years and has caused many retractions. I was aware of her work but there was no way for me to assist with that because it is not my area of research. I don’t have an appreciation of what these images should look like.

But about three years ago she shared a preprint written by three computer scientists on her Twitter account about a phenomenon they called ‘tortured phrases’. In doing their research and reading the literature, these computer scientists noticed that there were papers with very weird language in them. What they surmised was that to overcome plagiarism checks by software like Turnitin, people would run text through paraphrasing software. These software were very crude in that they would go word by word. For instance, it would look at a word and replace it with the first synonym it found in a thesaurus. It would do this word for word, which makes the text barely readable. However, it is novel and so it will not flag any plagiarism checking software. Eventually, if you as a publisher have outsourced the plagiarism checks to some software, and neither your editor or peer reviewer reads the text to check if it makes sense, then this will get through peer review process without any problem and the paper would get published.  

For an example of tortured phrases: sometimes there’s not only one way to say something. Particularly if English is not someone’s first language, you don’t want to be too harsh on anyone who’s just chosen a word which just isn’t what a native speaker would pick. But there are some phrases where there’s only one right way to say it. For instance, artificial intelligence is the phrase for the phenomenon you want to talk about, and if instead you use “man-made consciousness”, that’s not the phrase you need to use, particularly if the original text said artificial intelligence brackets AI, and your text says “man-made consciousness” brackets AI. It’s going to be very clear what has happened.  

The three computer scientists highlighted this phenomenon of ‘tortured phrases’, but entirely from within the computer science field. I wondered if a similar phenomenon was happening in my own field in fluid dynamics. Samples of these paraphrasing software are freely available online as little widgets so I took some standard phrases from fluid dynamics, which were the kind that would not make sense if you swapped the words around and generated a few of these tortured phrases, I googled them and up popped hundreds of papers featuring these phrases. That was the beginning for me. 

I started reporting papers with these phrases on a website called PubPeer, which is a website for post-publication peer review. I commented on these papers and started being in conversation with the computer scientists who wrote the paper on ‘tortured phrases’ because they built a tool to scrape the literature and automatically tabulate these papers featuring these phrases. They basically had a dictionary of phrases which they knew would be spat out by the software because some of this paraphrasing software are so crude, such that if you put in “artificial intelligence”, you are always going to get out “man-made consciousness” or a handful of variants. It didn’t come up with a lot of different things. If you could just search for “man-made consciousness” and it brings up many papers, you knew what has been going on. I contributed a lot of new ‘fingerprints’, which is what they call their dictionary that they would search the literature for. That is my origin story. 

On Paper Mills and the Sale of Authorships 

There is also the issue of meta-science, which has nothing to do with the text of the paper or with the data itself, but more to do with how someone may add a load of references through the paper which are not relevant, or they are all references to one person or a colleague. In that way you would be gaming the system to boost profiles, careers, and things like H-index. Because having more publications and more citations is so desirable, there is a market for this. It is easy to find online advertisements for authorship of scientific papers ranging from $100 to over $1000, depending on the impact factor of the journal, and the position of authorship you want: first authorship, seventh authorship, or whether you want to be the corresponding author, these sorts of factors. Likewise, you can buy citations.  

There are also organizations known as paper mills. For example, as an author I might have written the paper and want, or need, to make some money and so I go to this broker and say: I want to sell authorships, I’ll be author number six, but I can sell the first five authorships. Can you put me in touch with someone selling authorships? At the same time, there are people who go to them saying I want to buy an authorship, and they put two and two together acting as a middleman. Also, some of these paper mills do not want to wait for someone to come to them with a paper – they will write papers to order. They have an in-house team of scientific writers who produce papers. This does not necessarily mean that the paper is bad. Depending on where they want the paper to publish, the paper might have to be good if it has to get published. So, they will employ people with degrees, qualified people or PhD students who need to earn some money, and then they will sell the authorships and get the papers published. This is a big business. 

There is a whole industry behind it, and something I have moved onto investigating quite a lot is where these papers are going. When I identify these papers, I try to find out where they are being published, how they’re being published, who is behind them, who is running these paper mills, who is collaborating with them. Something I found out which resulted in an article in Science was that paper mills want to guarantee acceptance as much as they can. If a paper is not accepted, it creates a lot of work for them and it means a longer time before their customers get what they paid for. For example, if a paper that they wrote and sold authorships for gets rejected, they’re going to have to resubmit it to another journal. So something paper mills will do is they will submit a paper to 10 journals at once and publish with whichever journal gave them the easiest time. But still, they want to try and guarantee acceptance and one way to do that is to simply bribe the editor. I have found evidence of a papermill bribing some editors and there have been many, at least tens, if not hundreds, of editors that have been let go or told to stop being editors by journals in the last year because they have been found to be compromised. This could be because of bribery or some other way of being compromised. This is what I try to uncover.

Although I’m not fighting this alone, it can feel like that. Publishers are doing things to some extent and they’re doing things that they can’t tell you about as well. And then there’s other people like me investigating this in their free time or as a side project. Not enough of us are doing it because it is a multi-million-dollar industry that is generating these papers. More papers are being published than ever before so it is a big fight.


Stay tuned as we release the rest of the conversation with Nick over the next month. In the next post, we get Nick’s take on the peer review process and fake research data, and I ask his opinion on where the fault lies in the publication of fraudulent research. 

Open Research in the Humanities: The Future of Scholarly Communication

Authors: Emma Gilby, Matthias Ammon, Rachel Leow and Sam Moore

This is the second of a series of blog posts, presenting the reflections of the Working Group on Open Research in the Humanities.  Read the opening post here. The working group aimed to reframe open research in a way that was more meaningful to humanities disciplines, and their work will inform the University of Cambridge approach to open research.  This post considers the future of scholarly communication from a humanities perspective. 

PILLAR ONE: THE FUTURE OF SCHOLARLY COMMUNICATION 

This first pillar deals with ‘open access’ narrowly understood: the future of the publication landscape, and the question of the sustainability and viability of different publication models in an open access world.  

Opportunities 

The open access initiative in general values a wide range of contributions to academic life. The arts and humanities thrive on long-term, multi-scale, conversational, collaborative, interdisciplinary projects; all cultural work can be so defined. Any move towards research diversity therefore works in the favour of the arts and humanities.  

Open Research aims first at opening out ‘traditional’ research content, such as that published in journals and monographs. Thus it aims also to demystify the existing publication process. In general, it prioritizes the wide dissemination of public-facing research. Further, it allows us to envisage new forms of publication, such as the use of dynamic images and data visualisation as already undertaken in investigative journalism.1 Other examples of new Open Access formats include semi-public peer-to-peer review and the opportunity for readers to highlight passages and contribute to a crowd-sourced index of terms.2

Support required 

In the immediate and short term, A&H colleagues require institutional support to understand and get to grips with the current routes to open access within academic publishing, which present various advantages and challenges. For more detail see Plan S and the History Journal Landscape, A Royal Historical Society Guidance Paper https://royalhistsoc.org/policy/publication-open-access/plan-s-and-history-journals/ 

Current routes to OA in scholarly publishing include:  

  1. Paying directly for article or book processing charges levied by publishers. This is easy if one’s research falls among the very small percentage of A&H research that is funded by the research councils, who allow for such fees, but otherwise challenging.  
  1. Taking advantage of a ‘read and publish’ deal set up between a publisher and an institution. This is easy if one is at the right institution at the right time, but otherwise challenging. There is also confusion amongst colleagues about what happens when these time-limited, transitional deals expire: will publishers revert to simple processing charges (see above)? Or will all published material by then be fully OA (see below)?  
  1. The self-deposit in an OA institutional repository of a manuscript that is accepted for publication and peer reviewed but that has not been edited or typeset by the publisher in any way. This is easy with the right systems in place, but problematic because it neglects the import of the editing process in A&H research. Without undergoing this process, ‘accepted manuscripts’ are very vulnerable to errors, especially in the case of the very many scholars who regularly work in languages that are not their first, or in the case of early career scholars who are less familiar with critical processes and how to evidence them, or in the case of colleagues with various kinds of disabilities such as dyslexia. Other issues also abound with the deposit of manuscripts in repositories. In cases where scholars receive an acceptance that is subject to improvement, the final ‘date of acceptance’ is ambiguous for legal purposes. And in cases where the work in question uses copyrighted material, further legal issues emerge about when and how it may be possible to circulate this. In all these senses, then, many A&H colleagues simply dislike the thought of their ‘accepted manuscript’ circulating. In the case of institutional repositories, there seems to be a direct and obvious tension between the goals of open research and quality control.  
  1. Publishing with a fully OA journal or academic publisher that does not require a processing charge. This is obviously the most straightforward and therefore best route to OA, but raises the fundamental question of how such work is conducted and funded. The notion of the ‘scholar-led’ press, established and monitored by scholars themselves, presupposes that academics can somehow fit the work of the professional editor, copy editor, translator or type setter etc. into their spare time. In addition, many OA journals rely on charitable donations. Fundraising is also a skilled business: will universities’ development directors and offices be diverted to do the work of seeking these charitable donations? Is it possible for existing publishing houses and presses to construct a sustainable business model that allows for free and open publishing, while overlaying their own professional services onto the scholarly work provided by academics? Can already successful enterprises such as Open Book Publishers in Cambridge3 be ‘scaled up’? The members of the working group have not seen any impact assessments or pilot studies considering which of the current forms of scholarly communication will simply die out in the absence of subscription and royalty income. We would like to see evidence-based impact assessments as a matter of priority. In general, it is unclear whether even the largest and most prestigious scholarly societies will survive the loss of income that will result from a move to OA. As one member of our group put it, ‘the research is not open if it is dead’.  

Many questions remain, above and beyond those already evoked:  

  • The situation with respect to the goal of publishing of all academic monographs freely and openly remains extremely fluid, and all the enquiries we were able to make in the working group confirmed that this is an area of great uncertainty. Academic books require considerable up-front investment by publishers, and it is vital that this labour and expertise is properly supported in an open access model. How to ensure that open access books do not entail a race to the bottom in terms of editorial and production standards? 
  • Researchers and publishers will also have to think carefully about content such as book reviews, notices, short discussion pieces, author interviews and so on: content that is useful to the discipline, but peripheral to the article form and that would not generally appear in a repository, for example.   
  • The place of UK debates in the global publishing industry is unclear. Like all scholarly publishing, A&H publishing is international in nature and most journals and presses will draw from as wide an international field as possible. How will the editor of a UK-based journal, responding to the OA requirements of UK decision-making bodies, deal with international authors who are not subject to the same requirements or set of priorities? How will an international editor deal with UK academics?5 These questions come up repeatedly in conversations with colleagues.  
  • Scholarly societies in the arts and humanities do not charge a fortune for their journals, and also offer conferences, communities and support (financial and otherwise) for early-career scholars. To analyse the costs and benefits of access to their publications, it will be necessary to look across cost centres within any given institution. To offer a worked example of library costs from 2019, ‘the bundled UK cost for 2020 the RHS’s Transactions and its Camden book series is £205 (this is a maximum figure, excluding all discounts). In the financial year 1 July 2018-30 June 2019, RHS awarded (for example) £2,781.56 to support ECR researchers at York University and £3,177.16 to support ECR researchers at Oxford.’6 So it would be useful to see studies of the rate of institutional return on investment in publications by university libraries.  
  • Concerns about licensing were already well documented and summarized by Peter Mandler in 2014: ‘For one thing, we do not have full ownership of our texts ourselves – we use others’ words and images, often by permission. For another, we have our own norms of how best to incorporate one work within another – e.g. by quotation – which derivative use denies. Most important is our moral right (long acknowledged in law and ethics) to protect the integrity of our work. By all means read and disseminate our work free of charge, but do not change it as you are doing so – write your own work.’6  
  • Concerns about distortions allowed by CC BY in the reuse of oral history interviews and other sensitive/polemical content are important for many A&H colleagues as they are for our colleagues in the social sciences. 
  • Evidence of predatory publishers simply reusing content from repositories is starting to emerge, seemingly justifying concerns about CC BY as opposed to CC BY- NC-ND or CC BY-ND.7 

Footnotes

1See for instance a project on the takeover of real estate by the Church of Scientology in Clearwater, Florida: https://projects.tampabay.com/projects/2019/investigations/scientology-clearwater-real-estate, or a series of investigative articles on the post-9/11 burgeoning of the US intelligence services collected here: https://www.washingtonpost.com/people/william-m-arkin/

2Matthew Gold & Lauren Klein, eds. Debates in the Digital Humanities (2012), https://dhdebates.gc.cuny.edu

3 ‘We are a nonprofit independent publisher with no institutional backing. Open Book relies on sales and donations to continue publishing high-quality and free to read titles. We gratefully acknowledge the generous support of The Polonsky Foundationthe Thriplow Charitable Trust, the Jessica E. Smith and Kevin R. Brine Charitable Trust, The Progress Foundation and the Dutch Research Council (NWO).’ https://www.openbookpublishers.com

4 See the following testimony: ‘The bi-lingual, topic-specific journal I edit…draws articles from authors across the world and is published in Switzerland. Hence, specific OA requirements pertaining to UK-based authors will be considered in setting OA policy but will probably not be a determining factor. Hence, if strict requirements are introduced around OA in relation to UK funders, this may serve to reduce the possibility for UK-based authors to submit articles to my journal. This would obviously be an issue for the journal but would also be one for UK academics also, as it would result a more limited range of potential publication outlets.’ Margot Finn, Plan S and the History Journal Landscape, A Royal Historical Society Guidance Paper, pp. 47-8. 

5 Plan S and the History Journal Landscape, A Royal Historical Society Guidance Paper, p. 69, n. 110. 

6 Peter Mandler, ‘Open Access: a Perspective from the Humanities’, Insights 27 (2), 2014, http://doi.org/10.1629/2048-7754.89 

7 Guy Lavender, Jane Secker and Chris Morrison, ‘ What happens when you find your open access PhD thesis for sale on Amazon?’, 8th July 2021, https://blogs.lse.ac.uk/impactofsocialsciences/2021/07/08/what-happens-when-you-find-your-open-access-phd-thesis-for-sale-on-amazon/ 

Half-life is half the story

This week the STM Frankfurt Conference was told that a shift away from gold Open Access towards green would mean some publishers would not be ‘viable’ according to a story in The Bookseller. The argument was that support for green OA in the US and China would mean some publishers will collapse and the community will ‘regret it’.

It is not surprising that the publishing industry is worried about a move away from gold OA policies. They have proved extraordinarily lucrative in the UK with Wiley and Elsevier each pocketing an extra £2 million thanks to the RCUK block grant funds to support the RCUK policy on Open Access.

But let’s get something straight. There is no evidence that permitting researchers to make a copy of their work available in a repository results in journal subscriptions being cancelled. None.

The September 2013 UK Business, Innovation and Skills Committee Fifth Report: Open Access stated “There is no available evidence base to indicate that short or even zero embargoes cause cancellation of subscriptions”. In 2012 the Committee for Economic Development Digital Connections Council in The Future of Taxpayer-Funded Research: Who Will Control Access to the Results? concluded that “No persuasive evidence exists that greater public access as provided by the NIH policy has substantially harmed subscription-supported STM publishers over the last four years or threatens the sustainability of their journals”

I am the first to say that we should address questions about how the scholarly publishing landscape is shifting with systematic data gathering, analysis and discussion. We need to look at trends over time and establish what they mean for the ongoing stability of the scholarly literary corpus. But consistently evoking the ‘green open access equals cancellation so we should have longer embargoes’ argument is not the solution.

Let’s put this myth to bed once and for all.

The half life argument

Publishers have been trying to use the half-life argument for some time to justify extending their embargo periods on the author’s accepted manuscript. Embargoes are how long after publication before the manuscript (the author’s Word or LaTeX document, usually saved as a pdf) can be made available in the author’s institutional or a subject-based repository.

The half life of an article is the time it takes for articles to reach half their total number of downloads.

The argument goes along the lines of ‘if articles have a longer half life then they should be kept under embargo for longer’ because, according to a blog published at the beginning of this year by Alice Meadows Open access at Elsevier 2014 in retrospect and a look at 2015: “If an embargo period falls too far below the period it takes for a journal to recoup its costs, then the journal’s survival will be jeopardized.”

The problem with this argument is that there has been, and continues to be, no evidence that permitting authors to make work available in a repository leads to journal cancellations. It is ironic that the consistent line on this issue from the publishers has been that the half–life argument is helping ‘set evidence-based policy settings of embargo periods’.

The half-life spectre was raised again at this week’s STM meeting by Philip Carpenter, executive vice president of research at Wiley where he noted that only 20% of Wiley journal usage occurred in the first 12 months after publication and referred to a 12 month embargo offering only ‘limited protection’ according to The Bookseller.

Evidence for the green = cancellation argument

The need for longer embargoes – 1

The way the ‘evidence’ for this argument has been presented is telling. There is a particular paragraph in Meadow’s blog that is worth republishing in full:

How long those embargo periods should be before manuscripts become publicly accessible is a key issue. To help set evidence-based policy settings of embargo periods, we have contributed to growing industry data. Findings of a recent usage study demonstrated that there is variation in usage half-lives both within and between disciplines. This finding aligned with a study by the British Academy, which also found variation in half-lives between disciplines – and half-lives longer than those previously suggested.

Despite looking like links to two separate items (which gives the impression of more ‘evidence’), the first two links in the section above to ‘industry data’ and to a ‘recent usage study’ both lead to the SAME November, 25, 2013 study by Phil Davis into journal half life usage that started the whole shebang off. The study looked at the usage patterns of over 2800 journals found that only 3% of the journals had half-lives of 12 months or less. The fewest journals with this short half-life were in the Life Sciences (1%) and the highest in engineering (6%).

While in no way criticising the findings of that study, it should be pointed out that the author clearly states that the study was funded by the Professional & Scholarly Publishing (PSP) division of the Association of American Publishers (AAP). The work has not been peer reviewed or published in the literature.

The British Academy report Open Access Journals in the Humanities and Social Sciences does not appear to be available online any longer.

Now, there is no dispute that there are differences in usage patterns of articles between disciplines. This is a reflection of differing communication norms and behaviours. But there is a huge logic jump to then conclude that therefore we need to increase embargo periods. Peter Suber went into some detail on 11 January 2014 (yes, we have been swinging around on this one for a while now) explaining the logical flaw in the argument. At the time Kevin Smith also noted in a blog “Half-lives, policies and embargoes” that “we should not accept anything that is presented as evidence just because it looks like data; some connection to the topic at hand must be proved”.

The need for longer embargoes – 2

Meadow’s blog went on to say:

There are real-world examples where embargo periods have been set too low and the journal has become unviable. For example, as published in the The Scholarly Kitchen, the Journal of Clinical Investigation lost about 40 percent of its institutional subscriptions after adopting a 0-month embargo period in 1996, so it was forced to return to a subscription model in 2009. Similar patterns have been seen with other journals.

The issue referred to here has nothing to do with the half life of research papers that are being made available open access through a repository. This refers to a journal that went to a GOLD Open Access model in 1996 (publishing open access and relying on non-subscription revenue sources), but eventually decided they needed to impose a subscription again in 2009. Not only is this example entirely unrelated to the embargo issue for green Open Access, it happened six years ago. Note the blog does not link to other ‘similar patterns’. They do not exist.

Green policies mean cancellations

The half-life argument has replaced previous, even less substantial ‘evidence’ provided by the publishing industry in 2012. The study was cited as evidence for the argument that “short embargo periods are likely to lead to significant cancellations” by Wiley in a 2013 blog post Open Access – Keeping it Real and by Springer in an interview published as Open Access – Springer tightens rules on self archiving.

The study was conducted by the Association of Learned and Professional Society Publishers (ALPSP). However the study, which was written up and published online had some major methodological issues. It consisted of a single poorly worded question:

“If the (majority of) content of research journals was freely available within 6 months of publication, would you continue to subscribe? Please give a separate answer for a) Scientific, Technical and Medical journals and b) Humanities, Arts and Social Sciences Journals if your library has holdings in both of these categories.”

An analysis of the study highlighted methodological criticisms. The work was not peer reviewed. But there are deeper questions about the motivation behind the survey. The researcher was the Chair of the ALPSP Research Committee and was on the steering committee for the Publishers Research Coalition, raising questions about her (and the study’s) objectivity. There are several other issues relating to the validity of the researcher.

What is the real problem?

There is no doubt that open access policies are causing disruption to publisher’s funding models. That is hardly surprising and in some cases may well be the intent of the policy. But presenting spurious arguments to try and maintain the status quo is not moving this discussion forward.

The point is we do need evidence. If green OA is causing cancellations then let’s collect some numbers and talk about the issues:

  • How does this affect the scholarly communication system?
  • What are the implications?
  • Does this mean publishers will fold (unlikely in the short term)?
  • Will some journals close (possibly)?
  • Is that a problem?
  • Perhaps we need to consider issues relating to the reward system and what is valued?

But I will give the last word to the person who caused me to write this blog in the first place – Philip Carpenter, executive vice-president of research at Wiley who, according to The Bookseller said at the STM meeting: “We’ll need to think hard about what factors influence library purchasing decisions; we don’t know enough [about that]”.

Hear, hear.

Published 16 October 2015
Written by Dr Danny Kingsley
Creative Commons License