Tag Archives: publishing

Data Diversity Podcast #3 – Dr Nick H. Wise (4/4)

Thank you for staying with us throughout this four-part series with Dr Nick Wise, scientist and an engineer, who has made his name as a scientific sleuth. By now, it is hoped that he needs no introduction (though if you would like one, please look back at the previous posts).

In this final post, we get Nick’s take on what he thinks the repercussions should be for engaging in fraud, and we get a parting tip from Nick on what researchers should do when performing a literature search on papers in their field. Below are some excerpts from the conversation, which can be listened to in full here.


Most people don’t go into science wanting to fake stuff. With such cases, it can often be a sign that there’s a real problem in the lab or in the group. Why else would someone feel so compelled to do this? If the pressure is coming from the university demanding papers from them, then it’s the problem with the university. 


Repercussions for research fraud 

LO: You have mentioned that some editors have been let go from their positions as editors – are there any other repercussions for getting involved with fraud? 

NW: Often, institutions are the worst in terms of responding. Recently, I was at the World Conference on Research Integrity in Athens and spoke to other investigators like me, including publishers and people in the research integrity space. Some publishers have informed me that even when they want to make a retraction and have gone to the author’s or editor’s institution to inform them that a staff member has been involved with fraud, often the institution doesn’t reply at all, or even if they do, they will not do anything. They are very defensive, and they do not want any bad publicity for the institution and so they will not respond at all. Even in a well-regarded western University where someone has been caught fabricating their data, the response could just be that they have been relieved of teaching duties for six months, but they’ve kept their job and there will be no publicity that we know.  

In Spain, a professor that has just been made Rector, the Head of the University of Salamanca, the oldest university in Spain, has been linked to questionable publication practices for the last decade or so. He was found to have his name on an incredible number of papers which have been cited an incredible number of times, including by people who don’t exist. There has been a fight in the Spanish press to try highlight this. But despite of all this press, including national press in Spain, this person has become the Rector of the University of Salamanca. And it’s basically the same the world over: institutions very much go into protection mode even if publishers have agreed on retracting the papers. Often there are no career repercussions at all. Sometimes, they will just go and be editor of a different journal or for a different publisher. 

LO: In your opinion, what should happen to an academic or researcher who has engaged in fraud? 

NW: I think it really depends on the nature of the fraud and the position that the researcher holds. If a PhD student has done something and if they have been caught after, say, the first offence, then I think there should be leniency. Regardless of if they have bought an authorship, or if they have tried to fake some data, they still have a way out and it should be offered to them. Again, a lot of the drive for PhD students faking some data is because their P.I. (Principal Investigator) is demanding results, demanding that things happen faster, or demanding ground-breaking results. At some point, people become desperate. Most people don’t go into science wanting to fake stuff. With such cases, it can often be a sign that there’s a real problem in the lab or in the group. Why else would someone feel so compelled to do this? If the pressure is coming from the university demanding papers from them, then it’s the problem with the university. A lot of this drive is external to researchers. But if you have someone that is a tenured professor who has been doing this for a long time and they have been caught out on a decade or more of fabricated results, those feel like that should be the end of the road. It really depends on the nature of what has been done, the stage of career of the person, and how much fraud has been committed. 

LO: Do you ever worry about being called out for being sued for defamation? 

NW: I have thought about it, and I try to err on the side of caution and make sure that there is fairly hard evidence for anything I say publicly. You can have suspicions without saying anything publicly – you would just go to the publisher. But when I find an advert for a named paper and then six months later a paper with that same title is published, then it is clear cut that someone should investigate. But fortunately, so far, I have not been threatened with anything. 

I think it is also partly due to the fact that accusing people of making up their data is more personal. When authorship is bought, by the time I find it, some of these people would have already got what they needed. If they needed to have a publication in order to graduate, once they have graduated, they do not care if the publication is retracted. Often when you read a retraction notice after the authorship has been sold, they will normally say that none of the authors responded. This may also be down to the fact that they know that they have been caught but there is nothing to defend. But when you are accusing someone of making up data, I think that is far more personal attack. When someone has bought authorship, they do not have a personal connection to the paper, so they move on. They are probably annoyed, but they cannot do anything about it. 

Parting advice

LO: To end, are there any takeaways that you would like to share? 

 NW: I would encourage all researchers to download the PubPeer plugin, which means that whenever they are looking at a paper, it will flag whether there are any comments about that paper, or indeed any comments in the reference or the reference papers on PubPeer. If someone else has found a problem with that paper, they can just quickly go and check and be more informed. 


We are grateful for Dr Nick Wise sharing his perspective on the publishing industry and research culture that many of us are not privy to. Nick has highlighted many issues which raise pressing concerns for research integrity. We thank him for his time speaking with us and we hope that readers will take his advice on using PubPeer when they embark on literature searching (and of course, refrain from committing fraud, lest you will have Nick on your case).

Data Diversity Podcast #3 – Dr Nick H. Wise (3/4)

Welcome back to the penultimate post featuring Dr Nick H. Wise, Research Associate in Architectural Fluid Mechanics at the Department of Engineering, University of Cambridge. If you have been with us for the previous two posts, you would know that besides being a scientist and an engineer, Nick has made his name as a scientific sleuth who, based on an article on the blog Retraction Watch which was written in 2022, is responsible for more than 850 retractions, leading Times Higher Education to dub him as a research fraudbuster. Since then, through his X account @Nickwizzo, he has continued his investigations, tracking cases of fraud and in some cases, naming and shaming the charlatans. In this four-part series, we will learn from Nick about some of the shady activities that taint the scientific publishing industry today.

In part three, we learn from Nick about how researchers try to generate more citations from a single piece of research through a trick called ‘salami slicing’ and the blurred lines between illegality and desperately coping to meet with the unrealistic expectations of academia (to the point of engaging with fraud). Below are some excerpts from the conversation, which can be listened to in full here


Citation count was once a proxy for quality and now it is citation count regardless of quality. People are only looking at the citation count, and not the actual quality. Actually assessing quality takes a lot more effort. 


‘Salami slicing’ and the Game of Citations

LO: What do you think is better for science? A slower, more thoughtful process of publishing and everything in between? Or more information, more research, but then things like fraud slip through and occur more frequently?

NW: I don’t think there’s necessarily more research. Another phenomenon that paper mills take advantage of is salami slicing. Imagine you have completed a research project. Now you could write this up as one, thirty-page paper or two, twenty-page papers. You could write two comprehensive papers or try to put out multiple ten-page papers where you have some minor parameters changed. I see this happening in nanofluids research because it is an area of research close to mine. The nanofluid is simply a base liquid – it might be water, it might be ethanol – and into that you mix these very small nanoscale particles of some other material, such as gold, silver, or iron oxide. And in this sort of mixture of liquid and particles, you want to investigate its fluid flow and describe this with some differential equations. You can use computers to solve the differential equations and then plot some results about velocity profiles and heat transfer coefficients, etcetera. Now, you could write a paper for a given situation where you say, I’m not going to specify the liquid, but here is a general and viscosity of this liquid. If you want to apply this to your own research, you plug in the density and viscosity of your liquid, and likewise the particles. I’m not going to specify which particles are used, because all that changes is their density and their heat transfer coefficient properties. So that’s one way you could do it.

Another way to do it is to go I’m going to write a paper about water and gold particles; that’s one paper. Then you can write another paper which has water and silver particles, and then you can write one with ethanol and iron oxide, and there are so many varieties. You can also vary the geometry that this flow is going around, and you can add in an electric field and a magnetic field, etcetera. You can build up in this n-factorial way. There are thirty possible liquids multiplied by a hundred possible particles and multiplied by however many geometric configurations. You can see that this is what they are doing. Rather than writing a few quite general comprehensive papers, they are writing hundreds of very specific papers which enables them to produce more papers and sell more authorships and put more citations in. But this overwhelm of papers produced; there’s still only so many peer reviewers, and so many editors. And this phenomenon happens in lots of fields, they find something where there are just these variables that they can keep writing almost the same paper. Yet, the paper is original. It has not been done before. It is incredibly derivative, but that is not necessarily a barrier to publication.

LO: What I’m getting from this is, this is part of the whole system, and the issue at hand is definitely enabled by certain motivations like getting more citations. You can take one big piece of salami or publish that in one book, or you can slice the salami thirty ways. And if they are in the position to slice the salami, they say why not, I suppose, right? A game is there to be played.

NW: Right, they are playing the game that is in front of them. And again, there are people who do this who are not from a paper mill. They just want to maximize the number of citations and publications. The question is why are they doing this? Why do they want to maximize their publications? Because they want a promotion, or they want a tenured job. There are also countries where you get a cash reward for publishing a paper in a good journal so the more papers you publish, the more money you get paid. Your government might have told all the universities that they need to increase their ranking in the World University rankings. How do you do that? By increasing your research output and the citations you get. That is another driver. These drivers come from all sorts of places but there is always an emphasis on numbers. Citation count was once a proxy for quality and now it is citation count regardless of quality. People are only looking at the citation count and not the actual quality. Assessing quality takes a lot more effort.

LO: Citations used to be a proxy for quality, but that is not the case anymore. But it still implies the quality of the research, or you would hope.

NW: You would hope, but only because there is an assumption that the only reason something has a lot of citations is because it is good quality. Citations are also easier to count. Quality is much harder to account for, but that incentivizes people to do things like cite their colleagues. Again, you could still track it if people from the same university were citing each other. But then you get bigger scale things with middlemen who organize people from across the world to cite each other or just do it for cash. If you are publishing and you are producing papers to order, each one of those papers has a reference section which is real estate. You can throw in and have some genuine references which are relevant to this paper, but you can also throw in some irrelevant references that someone paid you to include. You can also pay someone to include references that are actually relevant to a topic.

LO: If it is relevant to a topic, it is almost like merely encouraging someone to be aware of certain work as opposed to a scam, which sounds like a gray area.

NW: Well, I would say that as soon as someone is paying money, then it starts to be illegitimate. But I mean if someone emails you and says “I’ve just published this paper, I think you might be interested, it’s in your research field: maybe read it or maybe you do cite it”, it’s different from someone emailing you to say “I’ll pay you £50 if you cite my paper” and you do. Then I would say that you have crossed a line. So, it does get very gray. Then there are these organized paper mills who are doing this as a business and that is where I think it becomes quite clear that it is probably not legitimate.

Facebook (authorship) marketplace

NW: You could go on Facebook and there are people selling authorship of their paper as a one off. There are PhD students in some country with no research funding who say “it costs $2500 for the article processing charge for me to publish where I would like to publish, I do not have $2500 so if you pay the $2500, you can be first author on the paper” and that is the only way they can get their paper published. They’re not doing this as a business, they’re just doing this once for this one paper. And you get people responding. Quite often professors or more established academics with access to budgets are the ones who will say yes. And the only thing that the person has done is to provide the funding for the publication.

The minimum thing that one is supposed to have done to be considered an author is to have either written the draft or reviewed and edited the paper. You might have also done data analysis or conceptualization. I think we would agree that if all this person does is just pay the fee for publication, then that is not acceptable. But what if they read the paper and then made a couple of comments? Now they have reviewed and edited it, and so now they have done review, editing and funding. There are many big labs around the world that have some very senior scientist whose name is on every single paper that comes out of the lab. And what have they done? Well, they provided all the funding, and they have reviewed the paper. I bet there are some who have barely glanced at the paper. But let’s say that they have reviewed the paper, and they provided the funding for the publication. Is that what makes it different to the person on Facebook who has found some random professor from another country to pay for their publication? Where is the difference? I don’t think it is an easy line to draw. In this way, the move to Open Access publishing requiring large fees for publication has also driven quite a bit of this phenomenon.

LO: It also seems like you have developed a bit of empathy. Maybe you’ve looked at so many cases and you see that it’s not always clear.

NW: Absolutely. Again, if you have the people running a paper mill, or if you have some professor who is being bribed and waving through dozens of papers, I don’t have much empathy for them. But the Masters or PhD student who has been told that they have to publish papers to get their PhD or even a Masters and they have this demand placed on them, or they even have produced a paper but they need this on the all this money to get it published, I don’t blame them for what they’re doing. It’s the situation they’ve been placed in. It is the system that they are part of. I have a lot of empathy for them.


Look out for the final post coming next week, where we get Nick’s take on what he thinks should be the repercussions for engaging in fraud, and we get a parting tip from Nick on what researchers should do when performing a literature search on papers in their field.

Data Diversity Podcast #3 – Dr Nick H. Wise (2/4)

We are back again with our second blog post featuring Dr Nick H. Wise, Research Associate in Architectural Fluid Mechanics at the Department of Engineering, University of Cambridge. As is the theme of the Data Diversity podcast, we spoke to Nick about his experience as a researcher, but this is a special edition of the podcast. Besides being a scientist and an engineer, Nick has made his name as a scientific sleuth who, based on an article on the blog Retraction Watch which was written in 2022, is responsible for more than 850 retractions, leading Times Higher Education to dub him as a research fraudbuster. Since then, through his X account @Nickwizzo, he has continued his investigations, tracking cases of fraud and in some cases, naming and shaming the charlatans.

In this four-part series, we will learn from Nick about some of the shady activities that taint the scientific publishing industry today. In this second part, we get Nick’s take on the peer review process and fake research data, and I ask his opinion on where the fault lies in the publication of fraudulent research. Below are some excerpts from the conversation, which can be listened to in full here


There are indices like Scopus or Web of Science or SCI, all these different bodies who claim journals are trustworthy, but every journal is going to get attacked by fraud and some will slip through. It is what you do afterwards that matters. 


On the peer review process

LO: As an Early Career Researcher, scientist, engineer, and researcher yourself, is your trust in the whole system still intact? Do you still see value in the peer review process? 

NW: It has absolutely changed how I read a paper and how I view particular journals. When you see a problem happening in a journal that you have read in your research or a journal you have considered submitting to, it really gives you pause for thought. There is an entire ecosystem of journals, right from the from the very good down to the very bad, that are implicated. There are indices like Scopus or Web of Science or SCI, all these different bodies who claim journals are trustworthy, but every journal is going to get attacked by fraud and some will slip through. It is what you do afterwards that matters. Another phenomenon that particularly happens with publishers with a wide list of journals, is that the paper mill will legitimately buy the journal. They may even take it over in a hostile way: they will make a clone of the journal and the website, and they will even redirect the publisher’s link to a different website. They now control a journal that is officially on this trustworthy list. Now they have a short period of time before someone notices and in that time, they will try to publish as many papers as possible and charge everyone for publication. They will absolutely cram this journal with any content. It does not even have to be relevant to the topic because they’re fully in control of the whole process up until the publisher notices and removes the journal from the list. For an author who needs a journal in a paper published in a well-regarded journal, they have achieved what they needed but as soon as the journal is removed from the list, then it becomes worthless. But there is a large supply of these journals, and they will keep trying to take them over. This tends to happen with low tier journals, but there are also paper mills which are targeting journals with an impact factor of over five, over ten – the supposedly absolute top tier journals. 

Between incompetence and conspiration

LO: These days, fraud is so convincing, scams are so rampant, and they always target your insecurities, the insecurity here being authors who want citations. 

NW: I would say that it is not a scam or fraud for the researcher, in the normal sense. These people are selling citations, and the buyer gets citations as opposed to someone getting cheated for their money and getting nothing in return. They are scamming the publishers and scamming the scientific community, but they are not scamming an actual person paying the money. It is a business that is operating as it says it is.  

LO: What does it say, though, that fraudulent papers are still getting through the peer review process. It’s still quite a long way from first draft to publication, and we have seen some cases where remnants of text from Chat GPT replies like “as a large language model…” gets through the review process. In your mind, what does it say about the industry? What’s happening here? 

NW: I think that it is somewhere between incompetence, people in a rush, and peer reviewers being bypassed or being paid. They could also be colluding with authors or the paper mill. To be fair, there are dodgy things that get through a legitimate peer review in the first place. All the peer reviewers are independent but how many people read every single word right of a paper they peer review? Not everyone. People have different standards that they hold themselves to. There is no agreed standard of what you are supposed to do to peer review a paper. As I’m sure anyone who has received peer review reports would know, sometimes you receive a five-page PDF document with hundreds of bullet points, and sometimes you receive a paragraph which maybe took them half an hour to put together. Legitimate peer reviewers could just not do a good job. Then there are also people who pride themselves on doing a load of peer reviews, and in fact you can get certificates from the publisher about how many peer reviews you do. There are people who say they peer review nearly a paper a day – I doubt that they are doing a great job at it.  

Even if someone is reading the text, how much is a peer reviewer supposed to be checking the data? Should someone be trying to run statistical analysis to see if they have been fudged? Should they be spotting that the image is manipulated? Is that something we should expect the peer reviewer to be doing? Or should a peer reviewer go into a review assuming the work is honest? It becomes a different process if you are also thinking about whether a piece of work is fraudulent or not. The easiest things to find are the people who are very lazy or very incompetent and there is just something that is so blatant that it is hard to miss. But if most people are trying to cover their tracks, then it comes down to just how well they have managed to do that. Again, if you are including remnants of Chat GPT like “as a large language model” in your text, you are either extremely lazy, or maybe you don’t read English. But if someone got rid of that bit, you would not notice from reading the abstract. You might think this is a bit bland, but people can write bland text; that is allowed. 

Sometimes peer reviewers are definitely compromised, and I don’t know what the balance is. When you see a bad paper, say a paper with an obvious problem or with chat GPT remnants lying around: is that bad peer reviewing or have they been paid not to notice, or even not to do it? I don’t know what the balance is there. I suspect it is more on the bad peer reviewing side than the criminal or the fraudulent to be honest, but I don’t know. There are times when you think OK, well, maybe they were paying the peer reviewers but did the editor look through this? Did the copy editor? We might want to think that copy editors and type setters are going through and questioning these things like this. It really depends on the journal. I have had things come back where they have gone through and changed from a comma to a dash, so they are clearly going through everything character by character. And there are other journals where the typesetter is clearly just taking everything with no thought. Their job is just to transfer what they have been given into the journal paper and they don’t do any spell checking or checking for grammar or anything. But should that be their job? I don’t know. Then there are journals where the only priority appears to be publishing as many papers as quickly as possible. And if you have made that your priority, even if everyone is acting in good faith, you are going to let a lot more things through. If you are just trying to push everything out the door and do things as quickly as possible, you are not going to give the things as much scrutiny. 

Fake research data

Even from doing my own research, I’ve realized that it would be very easy to fake some data. It would be very hard for anyone who wasn’t in the lab to know if data has been faked. There is no real way for someone to check. Even if you go open data; one experiment might need a few gigabytes of video footage to produce one data point. You can say what you have done to produce that data point, but for someone to go and check its validity, they would in theory need access to gigabytes and gigabytes of data that is not shared. But yes, there have been some things where it has been very easy to check. For instance, in material science, there are lots of experiments which result in the spectra diagram, basically producing a squiggly line on a graph. One thing that would always be true, and you don’t need any subject expertise to know this, is that the line should not double back on itself. Every X value should have one Y value. Well, if you are faking this by drawing it by hand with a mouse, it is quite hard to not double back and there are plenty of published Spectra which have bits where a peak bends over. And it is clearly because someone has drawn it by hand, and some of them are very bad. And that is again where you question what is happening with peer review because it is obvious that something is wrong. Sometimes they will even go outside the lines of the bounding box. I do see some of those because they are quite easy to spot. 


Stay tuned as we release the third conversation with Nick next week. In the penultimate post, we learn from Nick about how researchers try to generate more citations from a single piece of research from a trick called ‘salami slicing’ and the blurred lines between illegality and desperately coping to meet with the unrealistic expectations of academia to the point of engaging with fraud.

Data Diversity Podcast #3 – Dr Nick H. Wise (1/4)

In our third instalment of the Data Diversity Podcast, we are joined by Dr Nick H. Wise, Research Associate in Architectural Fluid Mechanics at the Department of Engineering, University of Cambridge. As is the theme of the podcast, we spoke to Nick about his experience as a researcher, but this is a special edition of the podcast. Besides being a scientist and an engineer, Nick has made his name as a scientific sleuth who, based on an article on the blog Retraction Watch which was written in 2022, is responsible for more than 850 retractions, leading Times Higher Education to dub him as a research fraudbuster. Since then, through his X account @Nickwizzo, he has continued his investigations, tracking cases of fraud and in some cases, naming and shaming the charlatans. Nick was kind to share with us many great insights over a 90-minute conversation, and as such we have decided to release a four part-series dedicated to the topic of research integrity. 

In this four-part series, we will learn from Nick about some of the shady activities that taint the scientific publishing industry today. In part one, we learn how Nick was introduced into the world of publication fraud and how that led him to investigate the industry behind it. Below are some excerpts from the conversation, which can be listened to in full here


I have found evidence of a papermill bribing some editors and there have been many, at least tens, if not hundreds, of editors that have been let go or told to stop being editors by journals in the last year because they have been found to be compromised. This could be because of bribery or some other way of being compromised. This is what I try to uncover. – Dr Nick H. Wise


Tortured Phrases and PubPeer: Nick’s beginnings as a Scientific Sleuth  

My background is in fluid dynamics where I mostly think about fluid dynamics within buildings. For instance, I think about the air flows generated by different heating systems and things like pollutant transport such as smells or COVID which can travel with the air and interact with other each other. That was my PhD and the post-doc in the Engineering department.

About three years ago whilst trying to avoid writing my thesis, I saw a tweet from the great Elizabeth Bik, who is possibly the most famous research fraud investigator. She mostly looks at biomedical images and her great skill is she would be able to look through a paper and see photos of Western blots of microscopy slides and see if parts of an image are identical to other parts, or if the image overlaps with images from different papers. She has an incredible memory and ability to spot these images. She’s been doing this for over 10 years and has caused many retractions. I was aware of her work but there was no way for me to assist with that because it is not my area of research. I don’t have an appreciation of what these images should look like.

But about three years ago she shared a preprint written by three computer scientists on her Twitter account about a phenomenon they called ‘tortured phrases’. In doing their research and reading the literature, these computer scientists noticed that there were papers with very weird language in them. What they surmised was that to overcome plagiarism checks by software like Turnitin, people would run text through paraphrasing software. These software were very crude in that they would go word by word. For instance, it would look at a word and replace it with the first synonym it found in a thesaurus. It would do this word for word, which makes the text barely readable. However, it is novel and so it will not flag any plagiarism checking software. Eventually, if you as a publisher have outsourced the plagiarism checks to some software, and neither your editor or peer reviewer reads the text to check if it makes sense, then this will get through peer review process without any problem and the paper would get published.  

For an example of tortured phrases: sometimes there’s not only one way to say something. Particularly if English is not someone’s first language, you don’t want to be too harsh on anyone who’s just chosen a word which just isn’t what a native speaker would pick. But there are some phrases where there’s only one right way to say it. For instance, artificial intelligence is the phrase for the phenomenon you want to talk about, and if instead you use “man-made consciousness”, that’s not the phrase you need to use, particularly if the original text said artificial intelligence brackets AI, and your text says “man-made consciousness” brackets AI. It’s going to be very clear what has happened.  

The three computer scientists highlighted this phenomenon of ‘tortured phrases’, but entirely from within the computer science field. I wondered if a similar phenomenon was happening in my own field in fluid dynamics. Samples of these paraphrasing software are freely available online as little widgets so I took some standard phrases from fluid dynamics, which were the kind that would not make sense if you swapped the words around and generated a few of these tortured phrases, I googled them and up popped hundreds of papers featuring these phrases. That was the beginning for me. 

I started reporting papers with these phrases on a website called PubPeer, which is a website for post-publication peer review. I commented on these papers and started being in conversation with the computer scientists who wrote the paper on ‘tortured phrases’ because they built a tool to scrape the literature and automatically tabulate these papers featuring these phrases. They basically had a dictionary of phrases which they knew would be spat out by the software because some of this paraphrasing software are so crude, such that if you put in “artificial intelligence”, you are always going to get out “man-made consciousness” or a handful of variants. It didn’t come up with a lot of different things. If you could just search for “man-made consciousness” and it brings up many papers, you knew what has been going on. I contributed a lot of new ‘fingerprints’, which is what they call their dictionary that they would search the literature for. That is my origin story. 

On Paper Mills and the Sale of Authorships 

There is also the issue of meta-science, which has nothing to do with the text of the paper or with the data itself, but more to do with how someone may add a load of references through the paper which are not relevant, or they are all references to one person or a colleague. In that way you would be gaming the system to boost profiles, careers, and things like H-index. Because having more publications and more citations is so desirable, there is a market for this. It is easy to find online advertisements for authorship of scientific papers ranging from $100 to over $1000, depending on the impact factor of the journal, and the position of authorship you want: first authorship, seventh authorship, or whether you want to be the corresponding author, these sorts of factors. Likewise, you can buy citations.  

There are also organizations known as paper mills. For example, as an author I might have written the paper and want, or need, to make some money and so I go to this broker and say: I want to sell authorships, I’ll be author number six, but I can sell the first five authorships. Can you put me in touch with someone selling authorships? At the same time, there are people who go to them saying I want to buy an authorship, and they put two and two together acting as a middleman. Also, some of these paper mills do not want to wait for someone to come to them with a paper – they will write papers to order. They have an in-house team of scientific writers who produce papers. This does not necessarily mean that the paper is bad. Depending on where they want the paper to publish, the paper might have to be good if it has to get published. So, they will employ people with degrees, qualified people or PhD students who need to earn some money, and then they will sell the authorships and get the papers published. This is a big business. 

There is a whole industry behind it, and something I have moved onto investigating quite a lot is where these papers are going. When I identify these papers, I try to find out where they are being published, how they’re being published, who is behind them, who is running these paper mills, who is collaborating with them. Something I found out which resulted in an article in Science was that paper mills want to guarantee acceptance as much as they can. If a paper is not accepted, it creates a lot of work for them and it means a longer time before their customers get what they paid for. For example, if a paper that they wrote and sold authorships for gets rejected, they’re going to have to resubmit it to another journal. So something paper mills will do is they will submit a paper to 10 journals at once and publish with whichever journal gave them the easiest time. But still, they want to try and guarantee acceptance and one way to do that is to simply bribe the editor. I have found evidence of a papermill bribing some editors and there have been many, at least tens, if not hundreds, of editors that have been let go or told to stop being editors by journals in the last year because they have been found to be compromised. This could be because of bribery or some other way of being compromised. This is what I try to uncover.

Although I’m not fighting this alone, it can feel like that. Publishers are doing things to some extent and they’re doing things that they can’t tell you about as well. And then there’s other people like me investigating this in their free time or as a side project. Not enough of us are doing it because it is a multi-million-dollar industry that is generating these papers. More papers are being published than ever before so it is a big fight.


Stay tuned as we release the rest of the conversation with Nick over the next month. In the next post, we get Nick’s take on the peer review process and fake research data, and I ask his opinion on where the fault lies in the publication of fraudulent research. 

Rights retention built into Cambridge Self-Archiving Policy

We’re delighted to announce that the University of Cambridge has a new Self-Archiving Policy, which took effect from 1 April 2023.  The policy gives researchers a route to make the accepted version of their papers open access without embargo under a licence of their choosing (subject to funder requirements). We believe that researchers should have more control over what happens to their own work and are determined to do what we can to help them to do that.

This policy has been developed after a year-long rights retention pilot in which more than 400 researchers voluntarily participated. The pilot helped us understand the implications of this approach across a wide range of disciplines so we could make an informed decision. We are also not alone in introducing a policy like this – Harvard has been doing it since 2008, cOAlition S have been a catalyst for development of similar policies, and we owe a debt of gratitude to the University of Edinburgh for sharing their approach with us. 

Some of the issues that cropped up during the pilot were outlined by Samuel Moore, our Scholarly Communications Specialist, in an earlier post on the Unlocking Research blog.  The patterns we saw at that stage continued throughout the year-long pilot – there was no issue for most articles, but some publishers caused confusion through misinformation or by presenting conflicting licences for the researchers to sign. We do recognise that there are costs involved in high quality publishing, and we are willing to cover reasonable costs (while noting our concerns around inequities in scholarly publishing).   The fact is that some publishers are trying to charge the sector multiple times for the same content – subscription fees, OA fees, other admin fees – all while receiving free content courtesy of researchers that are usually funded by the taxpayer and charity funders. 

Many researchers and funders are understandably becoming firmer in their convictions that publicly funded research should be openly and publicly available. We are fortunate that at Cambridge we are in a position to support this through our support for diamond publishing initiatives (in which the costs of publishing are absorbed for example by universities and no fees are charged to the reader or the author), through read and publish agreements negotiated on behalf of the UK higher education sector and through payment of costs associated with publishing in fully open access venues. Rights retention gives researchers a back-up plan for when other routes are not available to them, e.g. when a journal moves unexpectedly out of a read and publish agreement or a publisher does not offer any publishing route that meets their funder requirements. 

This is not the end goal, we have work to do to reach an equitable approach to global scholarly publishing, and we can learn a lot especially from how South America approaches these issues. We welcome opportunities to work together with others around the world to create a more sustainable and equitable future for scholarly communications.

Read more about the new Cambridge Self-Archiving Policy on the Cambridge Open Access website.

Open Research in the Humanities: The Future of Scholarly Communication

Authors: Emma Gilby, Matthias Ammon, Rachel Leow and Sam Moore

This is the second of a series of blog posts, presenting the reflections of the Working Group on Open Research in the Humanities.  Read the opening post here. The working group aimed to reframe open research in a way that was more meaningful to humanities disciplines, and their work will inform the University of Cambridge approach to open research.  This post considers the future of scholarly communication from a humanities perspective. 

PILLAR ONE: THE FUTURE OF SCHOLARLY COMMUNICATION 

This first pillar deals with ‘open access’ narrowly understood: the future of the publication landscape, and the question of the sustainability and viability of different publication models in an open access world.  

Opportunities 

The open access initiative in general values a wide range of contributions to academic life. The arts and humanities thrive on long-term, multi-scale, conversational, collaborative, interdisciplinary projects; all cultural work can be so defined. Any move towards research diversity therefore works in the favour of the arts and humanities.  

Open Research aims first at opening out ‘traditional’ research content, such as that published in journals and monographs. Thus it aims also to demystify the existing publication process. In general, it prioritizes the wide dissemination of public-facing research. Further, it allows us to envisage new forms of publication, such as the use of dynamic images and data visualisation as already undertaken in investigative journalism.1 Other examples of new Open Access formats include semi-public peer-to-peer review and the opportunity for readers to highlight passages and contribute to a crowd-sourced index of terms.2

Support required 

In the immediate and short term, A&H colleagues require institutional support to understand and get to grips with the current routes to open access within academic publishing, which present various advantages and challenges. For more detail see Plan S and the History Journal Landscape, A Royal Historical Society Guidance Paper https://royalhistsoc.org/policy/publication-open-access/plan-s-and-history-journals/ 

Current routes to OA in scholarly publishing include:  

  1. Paying directly for article or book processing charges levied by publishers. This is easy if one’s research falls among the very small percentage of A&H research that is funded by the research councils, who allow for such fees, but otherwise challenging.  
  1. Taking advantage of a ‘read and publish’ deal set up between a publisher and an institution. This is easy if one is at the right institution at the right time, but otherwise challenging. There is also confusion amongst colleagues about what happens when these time-limited, transitional deals expire: will publishers revert to simple processing charges (see above)? Or will all published material by then be fully OA (see below)?  
  1. The self-deposit in an OA institutional repository of a manuscript that is accepted for publication and peer reviewed but that has not been edited or typeset by the publisher in any way. This is easy with the right systems in place, but problematic because it neglects the import of the editing process in A&H research. Without undergoing this process, ‘accepted manuscripts’ are very vulnerable to errors, especially in the case of the very many scholars who regularly work in languages that are not their first, or in the case of early career scholars who are less familiar with critical processes and how to evidence them, or in the case of colleagues with various kinds of disabilities such as dyslexia. Other issues also abound with the deposit of manuscripts in repositories. In cases where scholars receive an acceptance that is subject to improvement, the final ‘date of acceptance’ is ambiguous for legal purposes. And in cases where the work in question uses copyrighted material, further legal issues emerge about when and how it may be possible to circulate this. In all these senses, then, many A&H colleagues simply dislike the thought of their ‘accepted manuscript’ circulating. In the case of institutional repositories, there seems to be a direct and obvious tension between the goals of open research and quality control.  
  1. Publishing with a fully OA journal or academic publisher that does not require a processing charge. This is obviously the most straightforward and therefore best route to OA, but raises the fundamental question of how such work is conducted and funded. The notion of the ‘scholar-led’ press, established and monitored by scholars themselves, presupposes that academics can somehow fit the work of the professional editor, copy editor, translator or type setter etc. into their spare time. In addition, many OA journals rely on charitable donations. Fundraising is also a skilled business: will universities’ development directors and offices be diverted to do the work of seeking these charitable donations? Is it possible for existing publishing houses and presses to construct a sustainable business model that allows for free and open publishing, while overlaying their own professional services onto the scholarly work provided by academics? Can already successful enterprises such as Open Book Publishers in Cambridge3 be ‘scaled up’? The members of the working group have not seen any impact assessments or pilot studies considering which of the current forms of scholarly communication will simply die out in the absence of subscription and royalty income. We would like to see evidence-based impact assessments as a matter of priority. In general, it is unclear whether even the largest and most prestigious scholarly societies will survive the loss of income that will result from a move to OA. As one member of our group put it, ‘the research is not open if it is dead’.  

Many questions remain, above and beyond those already evoked:  

  • The situation with respect to the goal of publishing of all academic monographs freely and openly remains extremely fluid, and all the enquiries we were able to make in the working group confirmed that this is an area of great uncertainty. Academic books require considerable up-front investment by publishers, and it is vital that this labour and expertise is properly supported in an open access model. How to ensure that open access books do not entail a race to the bottom in terms of editorial and production standards? 
  • Researchers and publishers will also have to think carefully about content such as book reviews, notices, short discussion pieces, author interviews and so on: content that is useful to the discipline, but peripheral to the article form and that would not generally appear in a repository, for example.   
  • The place of UK debates in the global publishing industry is unclear. Like all scholarly publishing, A&H publishing is international in nature and most journals and presses will draw from as wide an international field as possible. How will the editor of a UK-based journal, responding to the OA requirements of UK decision-making bodies, deal with international authors who are not subject to the same requirements or set of priorities? How will an international editor deal with UK academics?5 These questions come up repeatedly in conversations with colleagues.  
  • Scholarly societies in the arts and humanities do not charge a fortune for their journals, and also offer conferences, communities and support (financial and otherwise) for early-career scholars. To analyse the costs and benefits of access to their publications, it will be necessary to look across cost centres within any given institution. To offer a worked example of library costs from 2019, ‘the bundled UK cost for 2020 the RHS’s Transactions and its Camden book series is £205 (this is a maximum figure, excluding all discounts). In the financial year 1 July 2018-30 June 2019, RHS awarded (for example) £2,781.56 to support ECR researchers at York University and £3,177.16 to support ECR researchers at Oxford.’6 So it would be useful to see studies of the rate of institutional return on investment in publications by university libraries.  
  • Concerns about licensing were already well documented and summarized by Peter Mandler in 2014: ‘For one thing, we do not have full ownership of our texts ourselves – we use others’ words and images, often by permission. For another, we have our own norms of how best to incorporate one work within another – e.g. by quotation – which derivative use denies. Most important is our moral right (long acknowledged in law and ethics) to protect the integrity of our work. By all means read and disseminate our work free of charge, but do not change it as you are doing so – write your own work.’6  
  • Concerns about distortions allowed by CC BY in the reuse of oral history interviews and other sensitive/polemical content are important for many A&H colleagues as they are for our colleagues in the social sciences. 
  • Evidence of predatory publishers simply reusing content from repositories is starting to emerge, seemingly justifying concerns about CC BY as opposed to CC BY- NC-ND or CC BY-ND.7 

Footnotes

1See for instance a project on the takeover of real estate by the Church of Scientology in Clearwater, Florida: https://projects.tampabay.com/projects/2019/investigations/scientology-clearwater-real-estate, or a series of investigative articles on the post-9/11 burgeoning of the US intelligence services collected here: https://www.washingtonpost.com/people/william-m-arkin/

2Matthew Gold & Lauren Klein, eds. Debates in the Digital Humanities (2012), https://dhdebates.gc.cuny.edu

3 ‘We are a nonprofit independent publisher with no institutional backing. Open Book relies on sales and donations to continue publishing high-quality and free to read titles. We gratefully acknowledge the generous support of The Polonsky Foundationthe Thriplow Charitable Trust, the Jessica E. Smith and Kevin R. Brine Charitable Trust, The Progress Foundation and the Dutch Research Council (NWO).’ https://www.openbookpublishers.com

4 See the following testimony: ‘The bi-lingual, topic-specific journal I edit…draws articles from authors across the world and is published in Switzerland. Hence, specific OA requirements pertaining to UK-based authors will be considered in setting OA policy but will probably not be a determining factor. Hence, if strict requirements are introduced around OA in relation to UK funders, this may serve to reduce the possibility for UK-based authors to submit articles to my journal. This would obviously be an issue for the journal but would also be one for UK academics also, as it would result a more limited range of potential publication outlets.’ Margot Finn, Plan S and the History Journal Landscape, A Royal Historical Society Guidance Paper, pp. 47-8. 

5 Plan S and the History Journal Landscape, A Royal Historical Society Guidance Paper, p. 69, n. 110. 

6 Peter Mandler, ‘Open Access: a Perspective from the Humanities’, Insights 27 (2), 2014, http://doi.org/10.1629/2048-7754.89 

7 Guy Lavender, Jane Secker and Chris Morrison, ‘ What happens when you find your open access PhD thesis for sale on Amazon?’, 8th July 2021, https://blogs.lse.ac.uk/impactofsocialsciences/2021/07/08/what-happens-when-you-find-your-open-access-phd-thesis-for-sale-on-amazon/ 

‘Be nice to each other’ – the second Researcher to Reader conference

Aaaaaaaaaaargh! was Mark Carden’s summary of the second annual Researcher to Reader conference, along with a plea that the different players show respect to one another. My take home messages were slightly different:

  • Publishers should embrace values of researchers & librarians and become more open, collaborative, experimental and disinterested.
  • Academic leaders and institutions should do their bit in combating the metrics focus.
  • Big Deals don’t save libraries money, what helps them is the ability to cancel journals.
  • The green OA = subscription cancellations is only viable in a utopian, almost fully green world.
  • There are serious issues in the supply chain of getting books to readers.
  • And copyright arrangements in academia do not help scholarship or protect authors*.

The programme for the conference included a mix of presentations, debates and workshops. The Twitter hashtag is #r2rconf.

As is inevitable in the current climate, particularly at a conference where there were quite a few Americans, the shadow of Trump was cast over the proceedings. There was much mention of the political upheaval and the place research and science has in this.

[*please see Kent Anderson’s comment at the bottom of this blog]

In the publishing corner

Time for publishers to raise to the challenge

The conference opened with an impassioned speech by Mark Allin, the President and CEO of John Wiley & Sons, who started with the statement this was “not a time for retreat, but a time for outreach and collaboration and to be bold”.

The talk was not what was expected from a large commercial publisher. Allin asked: “How can publishers act as advocates for truth and knowledge in the current political climate?” He mentioned that Proquest has launched a displaced researchers programme in reaction to world events, saying, “it’s a start but we can play a bigger role”.

Allin asked what publishers can do to ensure research is being accessed. Referencing “The content trap” by Bharat Anand, Allin said “We won’t as a media industry survive as a media content and putting it in a bottle and controlling its distribution. We will only succeed if we connect the users. So we need to re-engineer the workflows making them seamless, frictionless. “We should be making sure that … we are offering access to all those who want it.”

Allin raised the issue of access, noting that ResearchGate has more usage than any single publisher. He made the point that “customers don’t care if it is the version of record, and don’t care about our arcane copyright laws”. This is why people use SciHub, it is ease of access. He said publishers should not give up protecting copyright but must realise its limitations and provide easy access.

Researchers are the centre of gravity – we need to help them spend more time researching and less time publishing, he says. There is a lesson here, he noted, suppliers should use “the divine discontent of the customer as their north star”. He used the example of Amazon to suggest people working in scholarly communication need to use technology much better to connect up. “We need to experiment more, do more, fail more, be more interconnected” he said, where “publishing needs open source and open standards” which are required for transformational impact on scholarly publishing – “the Uber equivalent”.

His suggestion for addressing the challenges of these sharing platforms is to “try and make your experience better than downloading from a pirate site”, and that this would be a better response than taking the legal route and issuing takedown notices.  He asked: “Should we give up? No, but we need to recognise there are limits. We need to do more to enable access.”

Allin called the situation, saying publishing may have gone online but how much has the internet really changed scholarly communication practices? The page is still a unit of publishing, even in digital workflows. It shouldn’t be, we should have a ‘digital first’ workflow. The question isn’t ‘what should the workflow look like?’, but ‘why hasn’t it improved?’, he said, noting that innovation is always slowed by social norms not technology. Publishers should embrace values of researchers & librarians and become more open, collaborative, experimental and disinterested.

So what do publishers do?

Publishers “provide quality and stability”, according to Kent Anderson, speaking on the second day (no relation to Rick Anderson) in his presentation about ‘how to cook up better results in communicating research’. Anderson is the CEO of Redlink, a company that provides publishers and libraries with analytic and usage information. He is also the founder of the blog The Scholarly Kitchen.

Anderson made the argument that “publishing is more than pushing a button”, by expanding on his blog on ‘96 things publishers do’. This talk differed from Allin’s because it focused on the contribution of publishers.

Anderson talked about the peer review process, noting that rejections help academics because usually they are about mismatch. He said that articles do better in the second journal they’re submitted to.

During a discussion about submission fees, Anderson noted that these “can cover the costs of peer review of rejected papers but authors hate them because they see peer review as free”. His comment that a $250 journal submission charge with one journal is justified by the fact that the target market (orthopaedic surgeons) ‘are rich’ received (rather unsurprisingly) some response from the audience via Twitter.

Anderson also made the accusation that open access publishers take lower quality articles when money gets tight. This did cause something of a backlash on the Twitter discussion with a request for a citation for this statement, a request for examples of publishers lowering standards to bring in more APC income with the exception of scam publishers. [ADDENDUM: Kent Anderson below says that this was not an ‘accusation’ but an ‘observation’. The Twitter challenge for ‘citation please?’ holds.]

There were a couple of good points made by Anderson. He argued that one of the value adds that publishers do is training editors. This is supported by a small survey we undertook with the research community at Cambridge last year which revealed that 30% of the editors who responded felt they needed more training.

The library corner

The green threat

There is good reason to expect that green OA will make people and libraries cancel their subscriptions, at least it will in the utopian future described by Rick Anderson (no relation to Kent Anderson), Associate Dean of University of Utah in his talk “The Forbidden Forecast, Thinking about open access and library subscriptions”.

Anderson started by asking why, if we’re in a library funding crisis, aren’t we seeing sustained levels of unsubscription? He then explained that Big Deals don’t save libraries money. They lower the cost per article, but this is a value measure, not a cost measure. What the Big Deal did was make cancellations more difficult. Most libraries have cancelled every journal that they can without Faculty ‘burning down the library’, to preserve the Big Deal. This explains the persistence of subscriptions over time. The library is forced to redirect money away from other resources (books) and into serials budget. The reason we can get away with this is because books are not used much.

The wolf seems to be well and truly upon us. There have been lots of cancellations and reduction of library budgets in the USA (a claim supported by a long list of examples). The number of cancellations grows as the money being siphoned off book budgets runs out.

Anderson noted that the emergence of new gold OA journals doesn’t help libraries, this does nothing to relieve the journal emergency. They just add to the list of costs because it is a unique set of content. What does help libraries is the ability to cancel journals. Professor Syun Tutiya, Librarian Emeritus at Chiba University in a separate session noted that if Japan were to flip from a fully subscription model to APCs it would be about the same cost, so that would solve the problem.

Anderson said that there is an argument that “there is no evidence that green OA cancels journals” (I should note that I am well and truly in this camp, see my argument). Anderson’s argument that this is saying the future hasn’t happened yet. The implicit argument here is that because green OA has not caused cancellations so far means it won’t do it into the future.

Library money is taxpayers’ money – it is not always going to flow. There is much greater scrutiny of journal big deals as budgets shrink.

Anderson argued that green open access provides inconsistent and delayed access to copies which aren’t always the version of record, and this has protected subscriptions. He noted that Green OA is dependent on subscription journals, which is “ironic given that it also undermines them”. You can’t make something completely & freely available without undermining the commercial model for that thing, Anderson argued.

So, Anderson said, given green OA exists and has for years, and has not had any impact on subscriptions, what would need to happen for this to occur? Anderson then described two subscription scenarios. The low cancellation scenario (which is the current situation) where green open access is provided sporadically and unreliably. In this situation, access is delayed by a year or so, and the versions available for free are somewhat inferior.

The high cancellation scenario is where there is high uptake of green OA because there are funder requirements and the version is close to the final one. Anderson argued that the “OA advocates” prefer this scenario and they “have not thought through the process”. If the cost is low enough of finding which journals have OA versions and the free versions are good enough, he said, subscriptions will be cancelled. The black and white version of Anderson’s future is: “If green OA works then subscriptions fail, and the reverse is true”.

Not surprisingly I disagreed with Anderson’s argument, based on several points. To start, there would need to have a certain percentage of the work available before a subscription could be cancelled. Professor Syun Tutiya, Librarian Emeritus at Chiba University noted in a different discussion that in Japan only 6.9% of material is available Green OA in repositories and argued that institutional repositories are good for lots of things but not OA. Certainly in the UK, with the strongest open access policies in the world, we are not capturing anything like the full output. And the UK is itself only 6% of the research output for the world, so we are certainly a very long way away from this scenario.

In addition, according to work undertaken by Michael Jubb in 2015 – most of the green Open Access material is available in places other than institutional repositories, such as ResearchGate and SciHub. Do librarians really feel comfortable cancelling subscriptions on the basis of something being available in a proprietary or illegal format?

The researcher perspective

Stephen Curry, Professor of Structural Biology, Imperial College London, spoke about “Zen and the Art of Research Assessment”. He started by asking why people become researchers and gave several reasons: to understand the world, change the world, earn a living and be remembered. He then asked how they do it. The answer is to publish in high impact journals and bring in grant money. But this means it is easy to lose sight of the original motivations, which are easier to achieve if we are in an open world.

In discussing the report published in 2015, which looked into the assessment of research, “The Metric Tide“, Curry noted that metrics & league tables aren’t without value. They do help to rank football teams, for example. But university league tables are less useful because they aggregate many things so are too crude, even though they incorporate valuable information.

Are we as smart as we think we are, he asked, if we subject ourselves to such crude metrics of achievement? The limitations of research metrics have been talked about a lot but they need to be better known. Often they are too precise. For example was Caltech really better than University of Oxford last year but worse this year?

But numbers can be seductive. Researchers want to focus on research without pressure from metrics, however many Early Career Researchers and PhD students are increasingly fretting about publications hierarchy. Curry asked “On your death bed will you be worrying about your H-Index?”

There is a greater pressure to publish rather than pressure to do good science. We should all take responsibility to change this culture. Assessing research based on outputs is creating perverse incentives. It’s the content of each paper that matters, not the name of the journal.

In terms of solutions, Curry suggested it would be better to put higher education institutions in 5% brackets rather than ranking them 1-n in the league tables. Curry calls for academic leaders and institutions to do their bit in combating the metrics focus. He also called for much wider adoption of the Declaration On Research Assessment (known as DORA). Curry’s own institution, Imperial College London, has done so recently.

Curry argued that ‘indicators’ would be a more appropriate term than ‘metrics’ in research assessment because we’re looking at proxies. The term metrics imply you know what you are measuring. Certainly metrics can inform but they cannot replace judgement. Users and providers must be transparent.

Another solution is preprints, which shift attention from container to content because readers use the abstract not the journal name to decide which papers to read. Note that this idea is starting to become more mainstream with the research by the NIH towards the end of last year “Including Preprints and Interim Research Products in NIH Applications and Reports

Copyright discussion

I sat on a panel to discuss copyright with a funder – Mark Thorley, Head of Science Information, Natural Environment Research Council , a lawyer – Alexander Ross, Partner, Wiggin LLP and a publisher – Dr Robert Harington,  Associate Executive Director, American Mathematical Society.

My argument** was that selling or giving the copyright to a third party with a purely commercial interest and that did not contribute to the creation of the work does not protect originators. That was the case in the Kookaburra song example. It is also the case in academic publishing. The copyright transfer form/publisher agreement that authors sign usually mean that the authors retain their moral rights to be named as the authors of the work, but they sign away rights to make any money out of them.

I argued that publishers don’t need to hold the copyright to ensure commercial viability. They just need first exclusive publishing rights. We really need to sit down and look at how copyright is being used in the academic sphere – who does it protect? Not the originators of the work.

Judging by the mood in the room, the debate could have gone on for considerably longer. There is still a lot of meat on that bone. (**See the end of this blog for details of my argument).

The intermediary corner

The problem of getting books to readers

There are serious issues in the supply chain of getting books to readers, according to Dr Michael Jubb, Independent Consultant and Richard Fisher from Something Understood Scholarly Communication.

The problems are multi-pronged. For a start, discoverability of books is “disastrous” due to completely different metadata standards in the supply chain. ONIX is used for retail trade and MARC is standard for libraries, Neither has detailed information for authors, information about the contents of chapters, sections etc, or information about reviews and comments.

There are also a multitude of channels for getting books to libraries. There has been involvement in the past few years of several different kinds of intermediaries – metadata suppliers, sales agents, wholesalers, aggregators, distributors etc – who are holding digital versions of books that can be supplied through the different type of book platforms. Libraries have some titles on multiple platforms but others only available on one platform.

There are also huge challenges around discoverability and the e-commerce systems, which is “too bitty”. The most important change that has happened in books has been Amazon, however publisher e-commerce “has a long way to go before it is anything like as good as Amazon”.

Fisher also reminded the group that there are far more books published each year than there are journals – it’s a more complex world. He noted that about 215 [NOTE: amended from original 250 in response to Richard Fisher’s comment below] different imprints were used by British historians in the last REF. Many of these publishers are very small with very small margins.

Jubb and Fisher both emphasised readers’ strong preference for print, which implies that much more work needed on ebook user experience. There are ‘huge tensions’ between reader preference (print) and the drive for e-book acquisition models at libraries.

The situation is probably best summed up in the statement that “no-one in the industry has a good handle on what works best”.

Providing efficient access management

Current access control is not functional in the world we live in today. If you ask users to jump through hoops to get access off campus then your whole system defeats its purpose. That was the central argument of Tasha Mellins-Cohen, the Director of Product Development, HighWire Press when she spoke about the need to improve access control.

Mellins-Cohen started with the comment “You have one identity but lots of identifiers”, and noted if you have multiple institutional affiliations this causes problems. She described the process needed for giving access to an article from a library in terms of authentication – which, as an aside, clearly shows why researchers often prefer to use Sci Hub.

She described an initiative called CASA – Campus Activated Subscriber-Access which records devices that have access on campus through authenticated IP ranges and then allows access off campus on the same device without using a proxy. This is designed to use more modern authentication. There will be “more information coming out about CASA in the next few months”.

Mellins-Cohen noted that tagging something as ‘free’ in the metadata improves Google indexing – publishers need to do more of this at article level. This comment was responded with a call out to publishers to make the information about sharing more accessible to authors through How Can I Share It?

Mellins-Cohen expressed some concern that some of the ideas coming out of RA21 Resource Access in 21st Century, an STM project to explore alternatives to IP authentication, will raise barriers to access for researchers.

Summary

It is always interesting to have the mix of publishers, intermediaries, librarians and others in the scholarly communication supply chain together at a conference such as this. It is rare to have the conversations between different stakeholders across the divide. In his summary of the event, Mark Carden noted the tension in the scholarly communication world, saying that we do need a lively debate but also need to show respect for one another.

So while the keynote started promisingly, and said all the things we would like to hear from the publishing industry, there is still the reality that we are not there yet.  And this underlines the whole problem. This interweb thingy didn’t happen last week. What has actually happened  to update the publishing industry in the last 20 years? Very little it seems. However it is not all bad news. Things to watch out for in the near future include plans for micro-payments for individual access to articles, according to Mark Allin, and the highly promising Campus Activated Subscriber-Access system.

Danny Kingsley attended the Researcher to Reader conference thanks to the support of the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

Published 27 February 2017
Written by Dr Danny Kingsley
Creative Commons License

Copyright case study

In my presentation, I spoke about the children’s campfire song, “Kookaburra sits in the old gum tree” which was written by Melbourne schoolteacher Marion Sinclair in 1932 and first aired in public two years later as part of a Girl Guides jamboree in Frankston. Sinclair had to get prompted to go to APRA (Australasian Performing Right Association) to register the song. That was in 1975, the song had already been around for 40 years but she never expressed any great interest in any propriety to the song.

In 1981 the Men at Work song “Down Under” made No. 1 in Australia. The song then topped the UK, Canada, Ireland, Denmark and New Zealand charts in 1982 and hit No.1 in the US in January 1983. It sold two million copies in the US alone.  When Australia won the America’s Cup in 1983 Down Under was played constantly. It seems extremely unlikely that Marion Sinclair did not hear this song. (At the conference, three people self-identified as never having heard the song when a sample of the song was played.)

Marion Sinclair died in 1988, the song went to her estate and Norman Lurie, managing director of Larrikin Music Publishing, bought the publishing rights from her estate in 1990 for just $6100. He started tracking down all the chart music that had been printed all over the world, because Kookaburra had been used in books for people learning flute and recorder.

In 2007 TV show Spicks and Specks had a children’s music themed episode where the group were played “Down Under” and asked which Australian nursery rhyme the flute riff was based on. Eventually they picked Kookaburra, all apparently genuinely surprised when the link between the songs was pointed out. There is a comparison between the music pieces.

Two years later Larrikin Music filed a lawsuit, initially wanting 60% of Down Under’s profits. In February 2010, Men at Work appealed, and eventually lost. The judge ordered Men at Work’s recording company, EMI Songs Australia, and songwriters Colin Hay and Ron Strykert to pay 5% of royalties earned from the song since 2002 and from its future earnings.

In the end, Larrikin won around $100,000, although legal fees on both sides have been estimated to be upwards $4.5 million, with royalties for the song frozen during the case.

Gregory Ham was the flautist in the band who played the riff. He did not write Down Under, and was devastated by the high profile court case and his role in proceedings. He reportedly fell back into alcohol abuse and was quoted as saying: “I’m terribly disappointed that’s the way I’m going to be remembered — for copying something.” Ham died of a heart attack in April 2012 in his Carlton North home, aged 58, with friends saying the lawsuit was haunting him.

This case, I argued, exemplifies everything that is wrong with copyright.

2016 – that was the year that was

 In January last year we published a blog post ‘2015 that was the year that was‘ which not only helped us take stock about what we have achieved, but also was very well received. So we have decided to do it again. For those who are more visually oriented, the slides ‘The OSC a lightning Tour‘ might be useful. 

Now starting its third year of operation, the Office of Scholarly Communication (OSC) has expanded to a team of 15, managing a wide variety of projects. The OSC has developed a set of strategic goals  to support its mission: “The OSC works in a transparent and rigorous manner to provide recognised leadership and innovation in the open conduct and dissemination of research at Cambridge University through collaborative engagement with the research community and relevant stakeholders.”

1. Working transparently

The OSC maintains an active outreach programme which fits with the transparent manner of the work that the OSC undertakes, which also includes the active documentation of workflows.

One of the ways we work transparently is to share many of our experiences and idea through this blog which receives over 2,000 visits a month. During 2016 the OSC published 41 blogs – eight blogs each on Scholarly Communication and Open Research, 14 on Open Access,  nine on Research Data Management and two on Library and training matters. The blogs we published in Open Access week were accessed 1630 times that week alone.

In addition to our websites for Scholarly Communication and Open Access, our Research Data Management website has been identified internationally as best practice and receives nearly 3,000 visitors a month.

We also run a Twitter feed for both Open Access with 1100 followers, and Open Data with close to 1200 followers. Many of the OSC staff also run their own Twitter feeds which share professional observations.

We also publish monthly newsletters, including one on scholarly communication matters. Our research data management newsletter has close to 2,000 recipients. Our shining achievement for the year however has to be the hugely successful scholarly communication Advent Calendar (which people are still accessing…)

We practise what we preach and share information about our work practices such as our reports to funders on APC spend and so on, through our repository Apollo and also by blogging about it – see Cambridge University spend on Open Access 2009-2016. We also share our presentations through Apollo and in Slideshare.

2. Disseminating research

The OSC has a strong focus on research support in all aspects of the scholarly communication ecosystem, from concept, through study design, preparation of research data management plans, decisions about publishing options and support with the dissemination of research outputs beyond the formal literature. The OSC runs an intense programme of advocacy relating to Open Access and Research Data Management, and has spoken to nearly 3,000 researchers and administrators since January 2015.

2.1 Open Access compliance

In April 2016, the HEFCE policy requiring that all research outputs intended to be claimed for the REF be made open access came into force. As a result, there has been an increased uptake of the Open Access Service with the 10,000th article submitted to the system in October. Our infographics on Repository use and Open Access demonstrate the level of engagement with our services clearly.

Currently half of the entire research output of the University is being deposited to the Open Access Service each month (see the blog: How open is Cambridge?). While this is good from a compliance perspective, it has caused some processing issues due to the manual nature of the workflows and insufficient staff numbers. At the time of writing, there is a deposit backlog of over 600 items to put into the repository and a backlog of over 2,300 items to be checked if they have been published so we can update the records.

The OA team made over 15 thousand ticket replies in 2016 – or nearly 60 per work day!

2.2 Managing theses

Work on theses continues, with the OSC driving a collaboration with Student Services to pilot the deposit of digital theses in addition to printed bound ones with a select group of departments from January 2017. The Unlocking Theses project in 2015-2016 has seen an increase in the number of historic theses in the repository from 700 to over 2,200 with half openly available. An upcoming digitisation project will add a further 1,400 theses. The upgrade of the repository and associated policies means all theses (not just PhDs) can be deposited and the OSC is in negotiation with several departments to bulk upload their MPhils and other sets of theses which are currently held in closed collections and are undiscoverable. This is an example of the work we are doing to unearth and disseminate research held all over the institution.

As a result of these activities it has become obvious that the disjointed nature of thesis management across the Library is inefficient. There is considerable effort being placed on developing workflows for managing theses centrally within the Library which the OSC will be overseeing into the future.

3. Research Support

3.1  Research Data Support

The number of data submissions received by the University repository is continuously growing, with Cambridge hosting more datasets in the institutional repository than any other UK university. Our ‘Data Sharing at Cambridge’ infographic summarises our work in this area.

A recent Primary Research Group report recognised Cambridge as having ‘particularly admirable data curation services’.

3.2 Policy development

The OSC is heavily involved in policy development in the scholarly communication space and participates in several activities external to the University. In July 2016 the UK Concordat on Open Research Data was published, with considerable input from the university sector, coordinated by the OSC.

We have representatives on the RCUK Open Access Practitioners Group, the UK Scholarly Communication License and Model Policy Steering Committee and the CASRAI Open Access Glossary Working Group, plus several other committees external to Cambridge. The OSC has contributed to discussions at the Wellcome Trust about ensuring better publisher compliance with their Open Access policy.

We are also updating and writing policies for aspects of research management across the University.

3.3 Collaborations with the research community

The OSC collaborates directly with the research community to ensure that the funding policy landscape reflects their needs and concerns. To that end we have held several town-hall meetings with researchers to discuss issues such as the mandating of CC-BY licensing, peer review and options relating to moving towards an Open Research landscape. We have also provided opportunities for researchers to meet directly with funders to discuss concerns and articulate amendments to the policies. The OSC has led discussions with the sector and arXiv.org, including visiting Cornell University, to ensure that researchers using this service to make their work openly available can be compliant under the HEFCE policy.

A new Research Data Management Project Group brings researchers and administrators together to work on specific issues relating to the retention and preservation of data and the management of sensitive data. We have also recruited over 40 Data Champions from across the University. Data Champions are researchers, PhD students or support staff who have agreed to advocate for data within their department: providing local training, briefing staff members at departmental meetings, and raising awareness of the need for data sharing and management.

The initiative began as an attempt to meet the growing need for RDM training, provide more subject-specific RDM support and begin more conversations about the benefits of RDM beyond meeting funders’ mandates. There has been a lot of interest in our Data Champions from other universities in the UK and abroad, with applications for our scheme coming from around the world. In response to this we have proposed a Bird of a Feather session at the 9th RDA plenary meeting in April to discuss similar initiatives elsewhere and creating RDM advocacy communities.  

3.3 Professional development for the research community

The OSC provides the research community with a variety of advocacy, training and workshops relating to research data management, sharing research effectively, bibliometrics and other aspects of scholarly communication. The OSC held over 80 sessions for researchers in 2016, including the extremely successful ‘Helping researchers publish’ event which we are repeating in February.

Our work with the Early Career Research (ECR) community has resulted in the development of a series of sessions about the publishing process for the PhD community. These have been enthusiastically embraced and there are negotiations with departments about making some courses compulsory. While this underlines the value of these offerings it does raise issues about staffing and how this will be financed.

The OSC is increasingly managing and hosting conferences at the University. Cambridge is participating in the Jisc Shared Repositories pilot and the OSC hosted an associated Research Data Network conference in September. In July 2016, the OSC organised a conference on research data sharing in collaboration with the Science and Engineering South Consortium, which was extremely well received and attracted over 80 attendees from all over the UK.

In November, the OpenCon Cambridge group – with which the OSC is heavily involved – held a OpenConCam satellite event which was very well attended and received very positive feedback. The storify of tweets is available, as is this blog about the event. The OSC was happy to both be a sponsor of the event and to be able to support the travel of a Cambridge researcher to attend the main OpenCon event in Washington and bring back her experiences.

Increasingly we are livestreaming our events and then making them available online as a resource for later.

3.4 Developing Library capacity for support

We have published a related post which details the training programmes run for library staff members in 2016. In total 500 people attended sessions offered in the Supporting Researchers in the 21st century programme, and we successfully ‘graduated’ the second tranche of the Research Support Ambassador Programme.

Conference session proposals on both the Supporting Researchers and the Research Ambassador programmes have been submitted to various national and international conferences. Dr Danny Kingsley and Claire Sewell have also had an abstract accepted for an article to appear in the 2017 themed issue of The New Review of Academic Librarianship.

4. Updating and integrating systems

The University repository, Apollo has been upgraded and was launched during Open Access Week. The upgrade has incorporated new services, including the ability to mint DOIs which has been enthusiastically adopted. A new Request a Copy service for users wishing to obtain access to embargoed material is being heavily used without any promotion, with around 300 requests a month flowing through. This has been particularly important given the fact that we are depositing works prior to publication, so we have to put them under an infinite embargo until we know the publication date (at which time we can set the embargo lift date). The huge number of over 2,000 items needing to be checked for  publication date means a large percentage of the contents of the repository is discoverable but closed under embargo.

In order to reduce the heavy manual workload associated with the deposit and processing of over 4,000 papers annually, the OSC is working with the Research Information Office on a systems integration programme between the University’s CRIS system – Symplectic – and Apollo, and retaining our integrated helpdesk system which uses a programme called ZenDesk. This should allow better compliance reporting for the research community, and reduce manual uploading of articles.

But this process involves a great deal more than just metadata matching and coding, and touches on the extremely ‘silo’ed nature of the support services being offered to our researchers across the institution. We are trying to work through these issues by instigating and participating in several initiatives with multiple administrative areas of the University.  The OSC is taking the lead with a ‘Getting it Together’ project to align the communication sent to researchers through the research lifecycle and across the range of administrative departments including Communication, Research Operations, Research Strategy and University Information Systems, termed the ‘Joined up Communications’ group. In addition we are heavily involved in the Coordinated and Functional Research Systems Group (CoFRS) the University Research Administration Systems Committee and the Cambridge Big Data Steering Group.

5. Pursuing a research agenda

Many staff members of the OSC originate from the research community and the team have a huge conference presence. The OSC team attended over 80 events in 2016 both within the UK and major conferences worldwide, including Open Scholarship Initiative, FORCE2016, Open Repositories, International Digital Curation Conference, Electronic Thesis & Dissertations, Special Libraries Association, RLUK2016, IFLA, CILIP and Scientific Data Conference.

Increasingly the OSC team is being asked to share their knowledge and experience. In 2016 the team gave four keynote speeches, presented 18 sessions and ran one Master Class. The team has also acted as session chair for two conferences and convened two sessions.

5.1 Research projects

The OSC is undertaking several research projects. In relation to the changing nature of scholarly communication services within libraries, we are in the process of analysing  job advertisements in the area of scholarly communication, we have also conducted a survey (to which we have received over 500 respondents) on the educational and training background of people working in the area of scholarly communication. The findings of these studies will be shared and published during 2017.

Dr Lauren Cadwallader was the first recipient of the Altmetrics Research Grant which she used to explore the types and timings of online attention that journal articles received before they were incorporated into a policy document, to see if there was some way to help research administrators make an educated guess rather than a best guess at which papers will have high impact for the next REF exercise in the UK. Her findings were widely shared internationally, and there is interest in taking this work further.

The team is currently actively pursuing several research grant proposals. Other research includes an analysis of data needs of research community undertaking in conjunction with Jisc.

5.2 Engaging with the research literature

Many members of the OSC hold several editorial board positions including two on the Data Science Journal, and one on the Journal of Librarianship and Scientific Communication. We also hold positions on the Advisory Board for PeerJ Preprints. We have a staff member who is the Associate Editor, New Review of Academic Librarianship . The OSC team also act as peer reviewers for scholarly communication papers.

The OSC is working towards developing a culture of research and publishing amongst the library community at Cambridge, and is one of the founding members of the Centre for Evidence Based Librarianship and Information Practice (C-EBLIP) Research Network.

6. Staffing

Despite the organisational layout remaining relatively stable between 2015 and 2016, this belies the perilous nature of the funding of the Office of Scholarly Communication. Of the 15 staff members, fewer than half are funded from ‘Chest’ (central University) funding. The remainder are paid from a combination of non-recurrent grants, RCUK funding and endowment funds.

The process of applying for funding, creating reports, meeting with key members of the University administration, working out budgets and, frankly, lobbying just to keep the team employed has taken a huge toll on the team. One result of the financial situation is many staff – including some crucial roles – are on short-term contracts and several positions have turned over during the year. This means that a disproportionate amount of time is spent on recruitment. The systems for recruiting staff in the University are, shall we say, reflective of the age of the institution.

In 2016 alone, as the Head of the OSC, I personally wrote five job descriptions and progressed them through the (convoluted) HR review process.  I conducted 32 interviews for OSC staff and participated in 10 interviews for staff elsewhere in the University where I have assisted with the recruitment. This  has involved the assessment of 143 applications. Because each new contract has a probation period, I have undertaken 27 probationary interviews. Given each of these activities involve one (or mostly more) other staff members, the impact of this issue in terms of staff time becomes apparent.

We also conducted some experiments with staffing last year. We have had a volunteer working with us on a research project and run a ‘hotdesk’ arrangement with colleagues from the Research Information Office, the Research Operations Office and Cambridge University Press. We also conducted a successful ‘work from home’ pilot (a first for the University Library).

7. Plans for 2017

This year will herald some significant changes for the University – with a new Librarian starting in April and a new Vice Chancellor in September. This may determine where the OSC goes into the future, but plans are already underway for a big year in 2017.

As always, the OSC is considering both a practical and a political agenda. On the ‘political’ side of the fence we are pursuing an Open Research agenda for the University. We are about to kick off of the two-year Open Research Pilot Project, which is a collaboration between the Office of Scholarly Communication and the Wellcome Trust Open Research team. The Project will look at gaining an understanding of what is needed for researchers to share and get credit for all outputs of the research process. These include non-positive results, protocols, source code, presentations and other research outputs beyond the remit of traditional publications. The Project aims to understand the barriers preventing researchers from sharing (including resource and time implications), as well as what incentivises the process.

We are also now at a stage where we need to look holistically at the way we access literature across the institution. This will be a big project incorporating many facets of the University community. It will also require substantial analysis of existing library data and the presentation of this information in an understandable graphic manner.

In terms of practical activities, our headline task is to completely integrate our open access workflows into University systems. In addition we are actively investigating how we can support our researchers with text and data mining (TDM). We are beginning to develop and roll out a ‘continuum’ of publishing options for the significant amount of grey literature produced within Cambridge. We are also expanding our range of teaching programmes – videos, online tools, and new types of workshops. On a technical level we are likely to be looking at the potential implementation of options offered by the Shared Repository Pilot, and developing solutions for managed access to data. We are also hoping to explore a data visualisation service for researchers.

Published 17 January 2017
Written by Dr Danny Kingsley
Creative Commons License

 

 

Open Data – moving science forward or a waste of money & time?

On the 4 November the Research Data Facility at Cambridge University invited some inspirational leaders in the area of research data management and asked them to address the question: “is open data moving science forward or a waste of money & time?”. Below are Dr Marta Teperek’s impressions from the event.

Great discussion

Want to initiate a thought-provoking discussion on a controversial subject? The recipe is simple: invite inspirational leaders, bright people with curious minds and have an excellent chair. The outcome is guaranteed.

We asked some truly inspirational leaders in data management and sharing to come to Cambridge to talk to the community about the pros and cons of data sharing. We were honoured to have with us:

  • PRE_IntroSlide_V3_20151123Rafael Carazo-Salas, Group Leader, Department of Genetics, University of Cambridge
    @RafaCarazoSalas
  • Sarah Jones, Senior Institutional Support Officer from the Digital Curation Centre; @sjDCC
  • Frances Rawle, Head of Corporate Governance and Policy, Medical Research Council; @The_MRC
  • Tim Smith, Group Leader, Collaboration and Information Services, CERN/Zenodo; @TimSmithCH
  • Peter Murray-Rust, Molecular Informatics, Dept. of Chemistry, University of Cambridge, ContentMine; @petermurrayrust

The discussion was chaired by Dr Danny Kingsley, the Head of Scholarly Communication at the University of Cambridge (@dannykay68).

What is the definition of Open Data?

IMG_PMRWithText_V1_20151126The discussion started off with a request for a definition of what “open” meant. Both Peter and Sarah explained that ‘open’ in science was not simply a piece of paper saying ‘this is open’. Peter said that ‘open’ meant free to use, free to re-use, and free to re-distribute without permission. Open data needs to be usable, it needs to be described, and to be interpretable. Finally, if data is not discoverable, it is of no use to anyone. Sarah added that sharing is about making data useful. Making it useful also involves the use of open formats, and implies describing the data. Context is necessary for the data to be of any value to others.

What are the benefits of Open Data?

IMG_RCSWithText_V1_20151126Next came a quick question from Danny: “What are the benefits of Open Data”? followed by an immediate riposte from Rafael: “What aren’t the benefits of Open Data?”. Rafael explained that open data led to transparency in research, re-usability of data, benchmarking, integration, new discoveries and, most importantly, sharing data kept it alive. If data was not shared and instead simply kept on the computer’s hard drive, no one would remember it months after the initial publication. Sharing is the only way in which data can be used, cited, and built upon years after the publication. Frances added that research data originating from publicly funded research was funded by tax payers. Therefore, the value of research data should be maximised. Data sharing is important for research integrity and reproducibility and for ensuring better quality of science. Sarah said that the biggest benefit of sharing data was the wealth of re-uses of research data, which often could not be imagined at the time of creation.

Finally, Tim concluded that sharing of research is what made the wheels of science turn. He inspired further discussions by strong statements: “Sharing is not an if, it is a must – science is about sharing, science is about collectively coming to truths that you can then build on. If you don’t share enough information so that people can validate and build up on your findings, then it basically isn’t science – it’s just beliefs and opinions.”

IMG_TSWithText_V1_20151126Tim also stressed that if open science became institutionalised, and mandated through policies and rules, it would take a very long time before individual researchers would fully embrace it and start sharing their research as the default position.

I personally strongly agree with Tim’s statement. Mandating sharing without providing the support for it will lead to a perception that sharing is yet another administrative burden, and researchers will adopt the ‘minimal compliance’ approach towards sharing. We often observe this attitude amongst EPSRC-funded researchers (EPSRC is one of the UK funders with the strictest policy for sharing of research data). Instead, institutions should provide infrastructure, services, support and encouragement for sharing.

Big data

Data sharing is not without problems. One of the biggest issues nowadays it the problem of sharing of big data. Rafael stressed that with big data, it was extremely expensive not only to share, but even to store the data long-term. He stated that the biggest bottleneck in progress was to bridge the gap between the capacity to generate the data, and the capacity to make it useful. Tim admitted that sharing of big data was indeed difficult at the moment, but that the need would certainly drive innovation. He recalled that in the past people did not think that one day it would be possible just to stream videos instead of buying DVDs. Nowadays technologies exist which allow millions of people to watch the webcast of a live match at the same time – the need developed the tools. More and more people are looking at new ways of chunking and parallelisation of data downloads. Additionally, there is a change in the way in which the analysis is done – more and more of it is done remotely on central servers, and this eliminates the technical barriers of access to data.

Personal/sensitive data

IMG_FRWithText_V1_20151126Frances mentioned that in the case of personal and sensitive data, sharing was not as simple as in basic sciences disciplines. Especially in medical research, it often required provision of controlled access to data. It was not only important who would get the data, but also what they would do with it. Frances agreed with Tim that perhaps what was needed is a paradigm shift – that questions should be sent to the data, and not the data sent to the questions.

Shades of grey: in-between “open” and “closed”

Both the audience and the panellists agreed that almost no data was completely “open” and almost no data was completely “shut”. Tim explained that anything that gets research data off the laptop to a shared environment, even if it was shared only with a certain group, was already a massive step forward. Tim said: “Open Data does not mean immediately open to the entire world – anything that makes it off from where it is now is an important step forward and people should not be discouraged from doing so, just because it does not tick all the other checkboxes.” And this is yet another point where I personally agreed with Tim that institutionalising data sharing and policing the process is not the way forward. To the contrary, researchers should be encouraged to make small steps at a time, with the hope that the collective move forward will help achieving a cultural change embraced by the community.

Open Data and the future of publishing

Another interesting topic of the discussion was the future of publishing. Rafael started explaining that the way traditional publishing works had to change, as data was not two-dimensional anymore and in the digital era it could no longer be shared on a piece of paper. Ideally, researchers should be allowed to continue re-analysing data underpinning figures in publications. Research data underpinning figures should be clickable, re-formattable and interoperable – alive.

IMG_DKWithText_V1_20151126Danny mentioned that the traditional way of rewarding researchers was based on publishing and on journal impact factors. She asked whether publishing data could help to start rewarding the process of generating data and making it available. Sarah suggested that rather than having the formal peer review of data, it would be better to have an evaluation structure based on the re-use of data – for example, valuing data which was downloadable, well-labelled, re-usable.

Incentives for sharing research data

IMG_SJWithText_V1_20151126The final discussion was around incentives for data sharing. Sarah was the first one to suggest that the most persuasive incentive for data sharing is seeing the data being re-used and getting credit for it. She also stated that there was also an important role for funders and institutions to incentivise data sharing. If funders/institutions wished to mandate sharing, they also needed to reward it. Funders could do so when assessing grant proposals; institutions could do it when looking at academic promotions.

Conclusions and outlooks on the future

This was an extremely thought-provoking and well-coordinated discussion. And maybe due to the fact that many of the questions asked remained unanswered, both the panellists and the attendees enjoyed a long networking session with wine and nibbles after the discussion.

From my personal perspective, as an ex-researcher in life sciences, the greatest benefit of open data is the potential to drive a cultural change in academia. The current academic career progression is almost solely based on the impact factor of publications. The ‘prestige’ of your publications determines whether you will get funding, whether you will get a position, whether you will be able to continue your career as a researcher. This, connected with a frequently broken peer-review process, leads to a lot of frustration among researchers. What if you are not from the world’s top university or from a famous research group? Will you be able to still publish your work in a high impact factor journal? What if somebody scooped you when you were about to publish results of your five years’ long study? Will you be able to find a new position? As Danny suggested during the discussion, if researchers start publishing their data in the ‘open”’ there is a chance that the whole process of doing valuable research, making it useful and available to others will be rewarded and recognised. This fits well with Sarah’s ideas about evaluation structure based on the re-use of research data. In fact, more and more researchers go to the ‘open’ and use blog posts and social media to talk about their research and to discuss the work of their peers. With the use of persistent links research data can be now easily cited, and impact can be built directly on data citation and re-use, but one could also imagine some sort of badges for sharing good research data, awarded directly by the users. Perhaps in 10 or 20 years’ time the whole evaluation process will be done online, directly by peers, and researchers will be valued for their true contributions to science.

And perhaps the most important message for me, this time as a person who supports research data management services at the University of Cambridge, is to help researchers to really embrace the open data agenda. At the moment, open data is too frequently perceived as a burden, which, as Tim suggested, is most likely due to imposed policies and institutionalisation of the agenda. Instead of a stick, which results in the minimal compliance attitude, researchers need to see the opportunities and benefits of open data to sign up for the agenda. Therefore, the Institution needs to provide support services to make data sharing easy, but it is the community itself that needs to drive the change to “open”. And the community needs to be willing and convinced to do so.

Further resources

  • Click here to see the full recording of the Open Data Panel Discussion.
  • And here you can find a storified version of the event prepared by Kennedy Ikpe from the Open Data Team.

Thank you

We also wanted to express a special ‘thank you’ note to Dan Crane from the Library at the Department of Engineering, who helped us with all the logistics for the event and who made it happen.

Published 27 November 2015
Written by Dr Marta Teperek
Creative Commons License