We are back again with our second blog post featuring Dr Nick H. Wise, Research Associate in Architectural Fluid Mechanics at the Department of Engineering, University of Cambridge. As is the theme of the Data Diversity podcast, we spoke to Nick about his experience as a researcher, but this is a special edition of the podcast. Besides being a scientist and an engineer, Nick has made his name as a scientific sleuth who, based on an article on the blog Retraction Watch which was written in 2022, is responsible for more than 850 retractions, leading Times Higher Education to dub him as a research fraudbuster. Since then, through his X account @Nickwizzo, he has continued his investigations, tracking cases of fraud and in some cases, naming and shaming the charlatans.
In this four-part series, we will learn from Nick about some of the shady activities that taint the scientific publishing industry today. In this second part, we get Nick’s take on the peer review process and fake research data, and I ask his opinion on where the fault lies in the publication of fraudulent research. Below are some excerpts from the conversation, which can be listened to in full here.
There are indices like Scopus or Web of Science or SCI, all these different bodies who claim journals are trustworthy, but every journal is going to get attacked by fraud and some will slip through. It is what you do afterwards that matters.
On the peer review process
LO: As an Early Career Researcher, scientist, engineer, and researcher yourself, is your trust in the whole system still intact? Do you still see value in the peer review process?
NW: It has absolutely changed how I read a paper and how I view particular journals. When you see a problem happening in a journal that you have read in your research or a journal you have considered submitting to, it really gives you pause for thought. There is an entire ecosystem of journals, right from the from the very good down to the very bad, that are implicated. There are indices like Scopus or Web of Science or SCI, all these different bodies who claim journals are trustworthy, but every journal is going to get attacked by fraud and some will slip through. It is what you do afterwards that matters. Another phenomenon that particularly happens with publishers with a wide list of journals, is that the paper mill will legitimately buy the journal. They may even take it over in a hostile way: they will make a clone of the journal and the website, and they will even redirect the publisher’s link to a different website. They now control a journal that is officially on this trustworthy list. Now they have a short period of time before someone notices and in that time, they will try to publish as many papers as possible and charge everyone for publication. They will absolutely cram this journal with any content. It does not even have to be relevant to the topic because they’re fully in control of the whole process up until the publisher notices and removes the journal from the list. For an author who needs a journal in a paper published in a well-regarded journal, they have achieved what they needed but as soon as the journal is removed from the list, then it becomes worthless. But there is a large supply of these journals, and they will keep trying to take them over. This tends to happen with low tier journals, but there are also paper mills which are targeting journals with an impact factor of over five, over ten – the supposedly absolute top tier journals.
Between incompetence and conspiration
LO: These days, fraud is so convincing, scams are so rampant, and they always target your insecurities, the insecurity here being authors who want citations.
NW: I would say that it is not a scam or fraud for the researcher, in the normal sense. These people are selling citations, and the buyer gets citations as opposed to someone getting cheated for their money and getting nothing in return. They are scamming the publishers and scamming the scientific community, but they are not scamming an actual person paying the money. It is a business that is operating as it says it is.
LO: What does it say, though, that fraudulent papers are still getting through the peer review process. It’s still quite a long way from first draft to publication, and we have seen some cases where remnants of text from Chat GPT replies like “as a large language model…” gets through the review process. In your mind, what does it say about the industry? What’s happening here?
NW: I think that it is somewhere between incompetence, people in a rush, and peer reviewers being bypassed or being paid. They could also be colluding with authors or the paper mill. To be fair, there are dodgy things that get through a legitimate peer review in the first place. All the peer reviewers are independent but how many people read every single word right of a paper they peer review? Not everyone. People have different standards that they hold themselves to. There is no agreed standard of what you are supposed to do to peer review a paper. As I’m sure anyone who has received peer review reports would know, sometimes you receive a five-page PDF document with hundreds of bullet points, and sometimes you receive a paragraph which maybe took them half an hour to put together. Legitimate peer reviewers could just not do a good job. Then there are also people who pride themselves on doing a load of peer reviews, and in fact you can get certificates from the publisher about how many peer reviews you do. There are people who say they peer review nearly a paper a day – I doubt that they are doing a great job at it.
Even if someone is reading the text, how much is a peer reviewer supposed to be checking the data? Should someone be trying to run statistical analysis to see if they have been fudged? Should they be spotting that the image is manipulated? Is that something we should expect the peer reviewer to be doing? Or should a peer reviewer go into a review assuming the work is honest? It becomes a different process if you are also thinking about whether a piece of work is fraudulent or not. The easiest things to find are the people who are very lazy or very incompetent and there is just something that is so blatant that it is hard to miss. But if most people are trying to cover their tracks, then it comes down to just how well they have managed to do that. Again, if you are including remnants of Chat GPT like “as a large language model” in your text, you are either extremely lazy, or maybe you don’t read English. But if someone got rid of that bit, you would not notice from reading the abstract. You might think this is a bit bland, but people can write bland text; that is allowed.
Sometimes peer reviewers are definitely compromised, and I don’t know what the balance is. When you see a bad paper, say a paper with an obvious problem or with chat GPT remnants lying around: is that bad peer reviewing or have they been paid not to notice, or even not to do it? I don’t know what the balance is there. I suspect it is more on the bad peer reviewing side than the criminal or the fraudulent to be honest, but I don’t know. There are times when you think OK, well, maybe they were paying the peer reviewers but did the editor look through this? Did the copy editor? We might want to think that copy editors and type setters are going through and questioning these things like this. It really depends on the journal. I have had things come back where they have gone through and changed from a comma to a dash, so they are clearly going through everything character by character. And there are other journals where the typesetter is clearly just taking everything with no thought. Their job is just to transfer what they have been given into the journal paper and they don’t do any spell checking or checking for grammar or anything. But should that be their job? I don’t know. Then there are journals where the only priority appears to be publishing as many papers as quickly as possible. And if you have made that your priority, even if everyone is acting in good faith, you are going to let a lot more things through. If you are just trying to push everything out the door and do things as quickly as possible, you are not going to give the things as much scrutiny.
Fake research data
Even from doing my own research, I’ve realized that it would be very easy to fake some data. It would be very hard for anyone who wasn’t in the lab to know if data has been faked. There is no real way for someone to check. Even if you go open data; one experiment might need a few gigabytes of video footage to produce one data point. You can say what you have done to produce that data point, but for someone to go and check its validity, they would in theory need access to gigabytes and gigabytes of data that is not shared. But yes, there have been some things where it has been very easy to check. For instance, in material science, there are lots of experiments which result in the spectra diagram, basically producing a squiggly line on a graph. One thing that would always be true, and you don’t need any subject expertise to know this, is that the line should not double back on itself. Every X value should have one Y value. Well, if you are faking this by drawing it by hand with a mouse, it is quite hard to not double back and there are plenty of published Spectra which have bits where a peak bends over. And it is clearly because someone has drawn it by hand, and some of them are very bad. And that is again where you question what is happening with peer review because it is obvious that something is wrong. Sometimes they will even go outside the lines of the bounding box. I do see some of those because they are quite easy to spot.
Stay tuned as we release the third conversation with Nick next week. In the penultimate post, we learn from Nick about how researchers try to generate more citations from a single piece of research from a trick called ‘salami slicing’ and the blurred lines between illegality and desperately coping to meet with the unrealistic expectations of academia to the point of engaging with fraud.