Tag Archives: open data

DSpace - OpenAlex logos

Towards enriched open scholarly information: integrating DSpace and OpenAlex

We are pleased to announce that, thanks to the support of the Vietsch Foundation, we will be developing an integration between DSpace repositories and OpenAlex. We are partnering with 4Science, a certified platinum DSpace provider, to deliver this project that will integrate two key systems within the global scholarly ecosystem, the DSpace repository (https://www.dspace.org/) and OpenAlex (https://openalex.org/), a free and open catalogue of the world’s scholarly research system.

Using OpenAlex’s open API (Application Programming Interface), this integration will allow for the quick import of relevant research and scholarly (meta)data into DSpace repositories, helping institutions to improve the quality and completeness of their records of research outputs and streamlining researcher publication and reporting workflows by providing accurate and relevant information in automated ways. This integration will also save time for researchers and librarians, who would be able to dedicate time to more research-oriented tasks.

Reusing the data in OpenAlex will help institutions to improve the quality and completeness of data in institutional repositories and strengthen the wider open access network by increasing the number of versions and access points to content. Moreover, the availability of multiple (open access) copies of the materials can provide a more effective strategy for long-term preservation.

The research and scholarly publishing environment is changing rapidly and there is an increasing expectation that research findings will be shared, both among funders and policy makers, and the wider research and public community. We strongly believe that institutional research repositories and scholarly platforms play a critical role in supporting these open research practices by preserving and disseminating research findings and supporting materials produced by institutions. The solution developed in this project will greatly contribute to increasing and enhancing the availability of open and accurate information about research outputs in the wider scholarly ecosystem.

Project information at the Vietsch Foundation site: https://www.vietsch-foundation.org/projects/

Data Diversity Podcast #3 – Dr Nick H. Wise (3/4)

Welcome back to the penultimate post featuring Dr Nick H. Wise, Research Associate in Architectural Fluid Mechanics at the Department of Engineering, University of Cambridge. If you have been with us for the previous two posts, you would know that besides being a scientist and an engineer, Nick has made his name as a scientific sleuth who, based on an article on the blog Retraction Watch which was written in 2022, is responsible for more than 850 retractions, leading Times Higher Education to dub him as a research fraudbuster. Since then, through his X account @Nickwizzo, he has continued his investigations, tracking cases of fraud and in some cases, naming and shaming the charlatans. In this four-part series, we will learn from Nick about some of the shady activities that taint the scientific publishing industry today.

In part three, we learn from Nick about how researchers try to generate more citations from a single piece of research through a trick called ‘salami slicing’ and the blurred lines between illegality and desperately coping to meet with the unrealistic expectations of academia (to the point of engaging with fraud). Below are some excerpts from the conversation, which can be listened to in full here


Citation count was once a proxy for quality and now it is citation count regardless of quality. People are only looking at the citation count, and not the actual quality. Actually assessing quality takes a lot more effort. 


‘Salami slicing’ and the Game of Citations

LO: What do you think is better for science? A slower, more thoughtful process of publishing and everything in between? Or more information, more research, but then things like fraud slip through and occur more frequently?

NW: I don’t think there’s necessarily more research. Another phenomenon that paper mills take advantage of is salami slicing. Imagine you have completed a research project. Now you could write this up as one, thirty-page paper or two, twenty-page papers. You could write two comprehensive papers or try to put out multiple ten-page papers where you have some minor parameters changed. I see this happening in nanofluids research because it is an area of research close to mine. The nanofluid is simply a base liquid – it might be water, it might be ethanol – and into that you mix these very small nanoscale particles of some other material, such as gold, silver, or iron oxide. And in this sort of mixture of liquid and particles, you want to investigate its fluid flow and describe this with some differential equations. You can use computers to solve the differential equations and then plot some results about velocity profiles and heat transfer coefficients, etcetera. Now, you could write a paper for a given situation where you say, I’m not going to specify the liquid, but here is a general and viscosity of this liquid. If you want to apply this to your own research, you plug in the density and viscosity of your liquid, and likewise the particles. I’m not going to specify which particles are used, because all that changes is their density and their heat transfer coefficient properties. So that’s one way you could do it.

Another way to do it is to go I’m going to write a paper about water and gold particles; that’s one paper. Then you can write another paper which has water and silver particles, and then you can write one with ethanol and iron oxide, and there are so many varieties. You can also vary the geometry that this flow is going around, and you can add in an electric field and a magnetic field, etcetera. You can build up in this n-factorial way. There are thirty possible liquids multiplied by a hundred possible particles and multiplied by however many geometric configurations. You can see that this is what they are doing. Rather than writing a few quite general comprehensive papers, they are writing hundreds of very specific papers which enables them to produce more papers and sell more authorships and put more citations in. But this overwhelm of papers produced; there’s still only so many peer reviewers, and so many editors. And this phenomenon happens in lots of fields, they find something where there are just these variables that they can keep writing almost the same paper. Yet, the paper is original. It has not been done before. It is incredibly derivative, but that is not necessarily a barrier to publication.

LO: What I’m getting from this is, this is part of the whole system, and the issue at hand is definitely enabled by certain motivations like getting more citations. You can take one big piece of salami or publish that in one book, or you can slice the salami thirty ways. And if they are in the position to slice the salami, they say why not, I suppose, right? A game is there to be played.

NW: Right, they are playing the game that is in front of them. And again, there are people who do this who are not from a paper mill. They just want to maximize the number of citations and publications. The question is why are they doing this? Why do they want to maximize their publications? Because they want a promotion, or they want a tenured job. There are also countries where you get a cash reward for publishing a paper in a good journal so the more papers you publish, the more money you get paid. Your government might have told all the universities that they need to increase their ranking in the World University rankings. How do you do that? By increasing your research output and the citations you get. That is another driver. These drivers come from all sorts of places but there is always an emphasis on numbers. Citation count was once a proxy for quality and now it is citation count regardless of quality. People are only looking at the citation count and not the actual quality. Assessing quality takes a lot more effort.

LO: Citations used to be a proxy for quality, but that is not the case anymore. But it still implies the quality of the research, or you would hope.

NW: You would hope, but only because there is an assumption that the only reason something has a lot of citations is because it is good quality. Citations are also easier to count. Quality is much harder to account for, but that incentivizes people to do things like cite their colleagues. Again, you could still track it if people from the same university were citing each other. But then you get bigger scale things with middlemen who organize people from across the world to cite each other or just do it for cash. If you are publishing and you are producing papers to order, each one of those papers has a reference section which is real estate. You can throw in and have some genuine references which are relevant to this paper, but you can also throw in some irrelevant references that someone paid you to include. You can also pay someone to include references that are actually relevant to a topic.

LO: If it is relevant to a topic, it is almost like merely encouraging someone to be aware of certain work as opposed to a scam, which sounds like a gray area.

NW: Well, I would say that as soon as someone is paying money, then it starts to be illegitimate. But I mean if someone emails you and says “I’ve just published this paper, I think you might be interested, it’s in your research field: maybe read it or maybe you do cite it”, it’s different from someone emailing you to say “I’ll pay you £50 if you cite my paper” and you do. Then I would say that you have crossed a line. So, it does get very gray. Then there are these organized paper mills who are doing this as a business and that is where I think it becomes quite clear that it is probably not legitimate.

Facebook (authorship) marketplace

NW: You could go on Facebook and there are people selling authorship of their paper as a one off. There are PhD students in some country with no research funding who say “it costs $2500 for the article processing charge for me to publish where I would like to publish, I do not have $2500 so if you pay the $2500, you can be first author on the paper” and that is the only way they can get their paper published. They’re not doing this as a business, they’re just doing this once for this one paper. And you get people responding. Quite often professors or more established academics with access to budgets are the ones who will say yes. And the only thing that the person has done is to provide the funding for the publication.

The minimum thing that one is supposed to have done to be considered an author is to have either written the draft or reviewed and edited the paper. You might have also done data analysis or conceptualization. I think we would agree that if all this person does is just pay the fee for publication, then that is not acceptable. But what if they read the paper and then made a couple of comments? Now they have reviewed and edited it, and so now they have done review, editing and funding. There are many big labs around the world that have some very senior scientist whose name is on every single paper that comes out of the lab. And what have they done? Well, they provided all the funding, and they have reviewed the paper. I bet there are some who have barely glanced at the paper. But let’s say that they have reviewed the paper, and they provided the funding for the publication. Is that what makes it different to the person on Facebook who has found some random professor from another country to pay for their publication? Where is the difference? I don’t think it is an easy line to draw. In this way, the move to Open Access publishing requiring large fees for publication has also driven quite a bit of this phenomenon.

LO: It also seems like you have developed a bit of empathy. Maybe you’ve looked at so many cases and you see that it’s not always clear.

NW: Absolutely. Again, if you have the people running a paper mill, or if you have some professor who is being bribed and waving through dozens of papers, I don’t have much empathy for them. But the Masters or PhD student who has been told that they have to publish papers to get their PhD or even a Masters and they have this demand placed on them, or they even have produced a paper but they need this on the all this money to get it published, I don’t blame them for what they’re doing. It’s the situation they’ve been placed in. It is the system that they are part of. I have a lot of empathy for them.


Look out for the final post coming next week, where we get Nick’s take on what he thinks should be the repercussions for engaging in fraud, and we get a parting tip from Nick on what researchers should do when performing a literature search on papers in their field.

Data Diversity Podcast #3 – Dr Nick H. Wise (1/4)

In our third instalment of the Data Diversity Podcast, we are joined by Dr Nick H. Wise, Research Associate in Architectural Fluid Mechanics at the Department of Engineering, University of Cambridge. As is the theme of the podcast, we spoke to Nick about his experience as a researcher, but this is a special edition of the podcast. Besides being a scientist and an engineer, Nick has made his name as a scientific sleuth who, based on an article on the blog Retraction Watch which was written in 2022, is responsible for more than 850 retractions, leading Times Higher Education to dub him as a research fraudbuster. Since then, through his X account @Nickwizzo, he has continued his investigations, tracking cases of fraud and in some cases, naming and shaming the charlatans. Nick was kind to share with us many great insights over a 90-minute conversation, and as such we have decided to release a four part-series dedicated to the topic of research integrity. 

In this four-part series, we will learn from Nick about some of the shady activities that taint the scientific publishing industry today. In part one, we learn how Nick was introduced into the world of publication fraud and how that led him to investigate the industry behind it. Below are some excerpts from the conversation, which can be listened to in full here


I have found evidence of a papermill bribing some editors and there have been many, at least tens, if not hundreds, of editors that have been let go or told to stop being editors by journals in the last year because they have been found to be compromised. This could be because of bribery or some other way of being compromised. This is what I try to uncover. – Dr Nick H. Wise


Tortured Phrases and PubPeer: Nick’s beginnings as a Scientific Sleuth  

My background is in fluid dynamics where I mostly think about fluid dynamics within buildings. For instance, I think about the air flows generated by different heating systems and things like pollutant transport such as smells or COVID which can travel with the air and interact with other each other. That was my PhD and the post-doc in the Engineering department.

About three years ago whilst trying to avoid writing my thesis, I saw a tweet from the great Elizabeth Bik, who is possibly the most famous research fraud investigator. She mostly looks at biomedical images and her great skill is she would be able to look through a paper and see photos of Western blots of microscopy slides and see if parts of an image are identical to other parts, or if the image overlaps with images from different papers. She has an incredible memory and ability to spot these images. She’s been doing this for over 10 years and has caused many retractions. I was aware of her work but there was no way for me to assist with that because it is not my area of research. I don’t have an appreciation of what these images should look like.

But about three years ago she shared a preprint written by three computer scientists on her Twitter account about a phenomenon they called ‘tortured phrases’. In doing their research and reading the literature, these computer scientists noticed that there were papers with very weird language in them. What they surmised was that to overcome plagiarism checks by software like Turnitin, people would run text through paraphrasing software. These software were very crude in that they would go word by word. For instance, it would look at a word and replace it with the first synonym it found in a thesaurus. It would do this word for word, which makes the text barely readable. However, it is novel and so it will not flag any plagiarism checking software. Eventually, if you as a publisher have outsourced the plagiarism checks to some software, and neither your editor or peer reviewer reads the text to check if it makes sense, then this will get through peer review process without any problem and the paper would get published.  

For an example of tortured phrases: sometimes there’s not only one way to say something. Particularly if English is not someone’s first language, you don’t want to be too harsh on anyone who’s just chosen a word which just isn’t what a native speaker would pick. But there are some phrases where there’s only one right way to say it. For instance, artificial intelligence is the phrase for the phenomenon you want to talk about, and if instead you use “man-made consciousness”, that’s not the phrase you need to use, particularly if the original text said artificial intelligence brackets AI, and your text says “man-made consciousness” brackets AI. It’s going to be very clear what has happened.  

The three computer scientists highlighted this phenomenon of ‘tortured phrases’, but entirely from within the computer science field. I wondered if a similar phenomenon was happening in my own field in fluid dynamics. Samples of these paraphrasing software are freely available online as little widgets so I took some standard phrases from fluid dynamics, which were the kind that would not make sense if you swapped the words around and generated a few of these tortured phrases, I googled them and up popped hundreds of papers featuring these phrases. That was the beginning for me. 

I started reporting papers with these phrases on a website called PubPeer, which is a website for post-publication peer review. I commented on these papers and started being in conversation with the computer scientists who wrote the paper on ‘tortured phrases’ because they built a tool to scrape the literature and automatically tabulate these papers featuring these phrases. They basically had a dictionary of phrases which they knew would be spat out by the software because some of this paraphrasing software are so crude, such that if you put in “artificial intelligence”, you are always going to get out “man-made consciousness” or a handful of variants. It didn’t come up with a lot of different things. If you could just search for “man-made consciousness” and it brings up many papers, you knew what has been going on. I contributed a lot of new ‘fingerprints’, which is what they call their dictionary that they would search the literature for. That is my origin story. 

On Paper Mills and the Sale of Authorships 

There is also the issue of meta-science, which has nothing to do with the text of the paper or with the data itself, but more to do with how someone may add a load of references through the paper which are not relevant, or they are all references to one person or a colleague. In that way you would be gaming the system to boost profiles, careers, and things like H-index. Because having more publications and more citations is so desirable, there is a market for this. It is easy to find online advertisements for authorship of scientific papers ranging from $100 to over $1000, depending on the impact factor of the journal, and the position of authorship you want: first authorship, seventh authorship, or whether you want to be the corresponding author, these sorts of factors. Likewise, you can buy citations.  

There are also organizations known as paper mills. For example, as an author I might have written the paper and want, or need, to make some money and so I go to this broker and say: I want to sell authorships, I’ll be author number six, but I can sell the first five authorships. Can you put me in touch with someone selling authorships? At the same time, there are people who go to them saying I want to buy an authorship, and they put two and two together acting as a middleman. Also, some of these paper mills do not want to wait for someone to come to them with a paper – they will write papers to order. They have an in-house team of scientific writers who produce papers. This does not necessarily mean that the paper is bad. Depending on where they want the paper to publish, the paper might have to be good if it has to get published. So, they will employ people with degrees, qualified people or PhD students who need to earn some money, and then they will sell the authorships and get the papers published. This is a big business. 

There is a whole industry behind it, and something I have moved onto investigating quite a lot is where these papers are going. When I identify these papers, I try to find out where they are being published, how they’re being published, who is behind them, who is running these paper mills, who is collaborating with them. Something I found out which resulted in an article in Science was that paper mills want to guarantee acceptance as much as they can. If a paper is not accepted, it creates a lot of work for them and it means a longer time before their customers get what they paid for. For example, if a paper that they wrote and sold authorships for gets rejected, they’re going to have to resubmit it to another journal. So something paper mills will do is they will submit a paper to 10 journals at once and publish with whichever journal gave them the easiest time. But still, they want to try and guarantee acceptance and one way to do that is to simply bribe the editor. I have found evidence of a papermill bribing some editors and there have been many, at least tens, if not hundreds, of editors that have been let go or told to stop being editors by journals in the last year because they have been found to be compromised. This could be because of bribery or some other way of being compromised. This is what I try to uncover.

Although I’m not fighting this alone, it can feel like that. Publishers are doing things to some extent and they’re doing things that they can’t tell you about as well. And then there’s other people like me investigating this in their free time or as a side project. Not enough of us are doing it because it is a multi-million-dollar industry that is generating these papers. More papers are being published than ever before so it is a big fight.


Stay tuned as we release the rest of the conversation with Nick over the next month. In the next post, we get Nick’s take on the peer review process and fake research data, and I ask his opinion on where the fault lies in the publication of fraudulent research.