Category Archives: Uncategorized

A Day in the Life of an Open Access Research Adviser

As part of the Office of Scholarly Communication Open Access Week celebrations, we are uploading a blog a day written by members of the team. Monday is a piece by Dr Philip Boyes reflecting on the variety of challenges of working in the Open Access team.

As anyone working in it knows all too well, Open Access can be a complicated field, with multiple policies from funders, institutions and publishers which can be complex, sometimes obscure and sometimes mutually contradictory. While we’re keen to raise awareness of and engagement with Open Access issues, the University of Cambridge’s view is that expecting academics to get to grips with all this themselves would represent an unreasonable demand on their time and likely lead to errors and resentment.

Instead, Cambridge’s policy is that authors should simply send us their Accepted Manuscript at acceptance through our simple upload system and our team of Research Advisers will check out exactly what they need to do to comply with all the relevant funder and journal policies and get back to them with individually-tailored advice. The same system also allows us to take care of deposit into the repository for HEFCE and to manage payments from the block grants we’ve received from the UK Research Councils (RCUK) and the Charities Open Access Fund (COAF – seven biomedical charities, including the Wellcome Trust).

The idea is that from the academic’s point of view the process feels smooth and seamless. But the reality is that very little of the process is automated. Behind the scenes there’s a lot of (thankfully metaphorical) running around by our team of three Open Access Research Advisers to provide this service, as well as working on broader issues of communication, processing APCs and improving our systems.

So what does a Cambridge Open Access Research Adviser do all day? Here’s a typical day in the life…

8.45am- Getting started

Arriving in the office, I check my emails and look at the Open Access Helpdesk. Overnight we’ve received around 15 new tickets, as well as some further correspondence on existing ones. Fairly typical. It’s split between manuscript uploads that need advice, general queries and invoicing correspondence from publishers. I start working through these on a first-come-first served basis.

They’re a real mixed bag. If a submitted article is straightforward we can deal with it in a few minutes – we check the journal site for their green and gold options and then advise the author on which is appropriate in each case. We also flag the manuscript for deposit into our repository – at the moment that’s a manual process and is mostly handled by temps.

Today things aren’t straightforward. A lot of the submissions are conference proceedings and there’s very little information on the conference websites. It’s not even clear whether some of these are being formally published (does private distribution on memory stick count? Do they have ISBNs or ISSNs?) It’s going to be a slow morning of chasing up authors and conference organisers for any information they have.

 10.00am – Complexity

I’m more or less through the conference proceedings, but we’re not through with complex cases. One of the invoices we’ve received is for an article we’ve not heard about before. It’s from a senior professor but he’s never submitted it to the open access service so we weren’t able to advise him on policy or eligibility for block grant funds. He selected the gold option for a Wellcome-funded correspondence article and now wants us to pay the $5000 + VAT bill. The trouble is, letters aren’t covered by the Wellcome policy so technically it isn’t eligible. I contact the author and break the news that he might have to pay this large bill himself and that this is why we like people to contact us first.

 11.00am – Clarity

The professor has got back to us. Although the journal’s classed it as a letter, the paper’s actually a very short research article, he says. I decide to contact Wellcome for guidance and let them decide whether they want this to be paid for from the COAF block grant.

 11:30am – Deja-vu

For the moment the backlog on the helpdesk has been cleared and our temps are busy adding manuscripts to the repository and updating previously-added articles with citation details and embargo end-dates. I have a bit of free time to move on to something else so begin to tackle the stack of publisher APC invoices that need processing.

They’re mostly correct, but some publishers and invoicing companies are better than others. Inevitably there are a few errors that need chasing up or publishers who have invoiced us repeatedly for the same thing. Among the stack is an overdue notice from a major publisher for a familiar article. It’s one we’ve repeatedly confirmed was paid fully almost two years ago but every few months ever since the publisher has told us it’s outstanding. I send them back the payment reference and details yet again and ask them to mark the issue as resolved. I somehow suspect we’ll be seeing it again.

 2.00pm – Presentation

Today offers a welcome opportunity to get out of the office. We’re holding a joint Open Access/Open Data presentation to researchers in one of the University’s departments to try and increase awareness of the policies. Our stats show that this department has particularly low engagement with the Open Access service so we’re keen work out why. It’s a fractious crowd. One or two people are keen Open Access advocates and speak up to say how simple the system is, but some others are vocal about their view that it’s an unwarranted burden and tell us they don’t see why they should bother.

We try to explain the benefits and funder mandates, as well as how we’ve tried to make the system as simple as possible. When we get back to the office we find that one of those present has sent us their back-catalogue of thirty articles stretching back to 2007 to put into the repository.

 4.00 – Compliance

While my colleagues work on the helpdesk I need to turn my attention to compliance and reporting. All too often when we’ve paid an APC the publisher hasn’t delivered Open Access with the correct licence, or in some cases at all. I generally try to do a weekly check of the articles for which we’d paid APCs to see whether they’ve been published correctly but it’s time-consuming and things have been busy lately. It’s been around three weeks since the last check so it really needs doing.

But the deadline is also fast approaching for annual reports to RCUK and COAF. These are both large and complex, and cover slightly different periods (and different again from the Jisc report a couple of months ago). It’s proving a major challenge to get the information together from our various systems and to match it to the relevant figures from the University Finance System. I decide to let the compliance checking wait a bit longer and work on trying to move things along on the reports. I make a bit of progress, but there’s still a huge amount left to do – information on thousands of articles that needs to be manually collated. With luck in the future we’ll have integrated systems that can do much of this automatically, but for now each report represents weeks of work.

Wrap up

There is, then, a huge variety and amount of work that goes into the Open Access service. The Helpdesk and the reporting alone would be more than enough to keep us busy, but we also have to make time for outreach and communications, managing the finances, improving our systems and more. We’re finding that as our team grows, we’re starting to specialise more into particular areas, but we’re still basically all generalists, working on all areas of the job. This balance between specialisation for the purposes of efficiency and the need for individuals to be able to move effectively from one task to another – not least to keep our jobs interesting and varied – is one that’s likely to become ever more challenging as the volume of articles we handle increases.

Published 19 October 2015
Written by Dr Philip Boyes
Creative Commons License

In conversation with Michael Ball from BBSRC

The Biotechnical and Biological Sciences Research Council (BBSRC) Data Sharing Policy states that research data that supports publications must be stored for 10 years and adherence to data management plans will be monitored and built into the Final Report score, which may be taken into account for future proposals.

Recently Michael Ball, the Strategy and Policy Manager at BBSRC accepted an invitation to Cambridge University to discuss the BBSRC policy on opening up access to data. Senior members of the University, the School of Biological Sciences, the Research Office and the Office of Scholarly Communications attended. These notes have been verified by Michael as an accurate reflection of the discussion.

The take home messages from the meeting were the importance of:

  • Disciplines themselves establishing ways of dealing with data
  • Thinking about how to deal with data from the beginning of a research project

The meeting began with a discussion about the support we provide Cambridge University researchers through the Research Data Service , the resources provided on the data website and the enthusiastic uptake of the service since the beginning of the year.

The conversation then moved into issues around the policy, focusing on several aspects – clarification of what needs to be shared, how this will be supported financially, questions about auditing, a discussion about the best place to keep the data and issues with data sharing in the biological sciences.

What data are we expected to share?

What is ‘supporting data’ in the biological sciences?

One of the biggest concerns biological researchers have about data sharing is what is meant by ‘data’. Biology has the most diverse group of data, which makes it hard to talk about biology because the issues are project and problem specific.

Michael confirmed the policy broadly refers to all data ‘but the devil is in the detail, there are lots of caveats’.  He echoed Ben Ryan in answer to a similar question of the EPSRC policy by saying the key points are:

  • What would you expect to see?
  • What do you think is important?

The interpretation of the BBSRC policy depends heavily on the types of data being produced.  Much is dependent on the expected norms, what a researcher would expect to see if they were trying to interpret the paper. What are the underlying supporting data for the paper?

The biological sciences throw up a particular challenge in the range and disparity in disciplinary norms. For example a great deal of data arises from genomics and some time ago they made the decision to share, including making decisions about what to share and what not to share. However, there are vast areas of experimental science where the paper itself is data.

The policy is going one step further back from the published paper towards the lab. In the future these data policies might go further back, if there was greater automation of the research process.

Michael confirmed that if the BBSRC has funded a PhD student they would expect them to make supporting data available.

What do we need to share in the Biological Sciences?

There is no expectation to share lab books unless they are the only place the data exists. Michael noted that when the BBSRC wrote the policy it excluded lab books and organisms.

However there is an expectation to share instrumental output. This is with the caveat that if it is output from an instrument that goes through some sort of amendment then you don’t need to share the original.

An example: A researcher is counting bacteria on a plate and scrupulously making notes in lab books before entering this information put into a computer spreadsheet to crunch the numbers. The expectation would be to share the spreadsheet not the lab book.

Some research requires the construction of a piece of technology where there might not be a great deal of associated data around it. In these instances it is the process of construction or the protocol or the methodology that is important to share.

Michael noted that in some disciplines, given the materials and input parameters and the same instruments, the output data will be the same each time. In these circumstances it is most sensible to share or describe the inputs and repeat the experiments. The question is about what would be the most useful to share.

Show me the money

A stitch in time

Michael confirmed that researchers can ask for the money they need (and can justify) for research data management in grant applications. He did say however that the BBSCR does not ‘generally see a lot of these requests’. He noted that this is because often people haven’t thought about the data they will generate at the start of the project. One of the researchers pointed out it was difficult to know how to fund it because ‘we are not sure what we need’. However, this should not be a reason to ask for nothing.

It may be that some of the discipline specific repositories will have to change their business models in the future to cope with larger data sets.

Michael said that it is worth thinking about data sharing at the project planning stage because different types of data have different requirements. Researchers might need to allow for the cost of getting the data in the right format and metadata. It is advisable to think about where the data will be published so the research team can prepare the data in the first instance.

Michael said that the data management plan should hopefully prompt how much data a research project will produce. It is advisable to consider the maximum amount of data the project may produce. The ideal situation will be to have an ongoing data management plan because in some ways it is useful at the end.

Longer term financial support

Raised in the meeting was the option of charging a flat fee up front regardless of the data being generated. The question arose about whether there was any danger in auditing with this approach? The problem with an up front fee is it becomes more difficult to track and output from a specific grant against what we put into the database. There is a directly incurred and directly allocated component to the cost.

Michael confirmed that any money allocated to data management won’t survive past the end of the grant. He noted this was something that he was ‘not sure how to unpick’. This raises the issue of the cost of longer term data sharing. The BBSRC provides funding to a certain point in time. There can be a secondary experiment funded by someone else and the works are published together. But the researcher can only share the data from the funded part. The BBSRC does not ask researchers to share data that they haven’t funded.

Auditing questions

Who is in charge here?

The academics raised the concern that there could be ‘mission creep’ where the funders expect people to do things that are a waste of time. They mentioned that an ideal situation would be where the research community decide what they want to share and what they don’t wish to share.

Michael noted that the BBSRC has to be guided by the community on their own community norms for data sharing, and this is why aspects of the data sharing policy is quite open. He noted that this meeting represented the first part of the process – where the funder comes together with communities to decide what is essential.

In addition, many journals are now requiring open data. It is the funders, the researchers and the journals who are asking for it. To some extent the BBSRC policy is guided by what the journals are asking for.

The policing process

The group expressed interest in how the BBSRC policy is policed and what would be the focus of that policing. Michael stated that BBSRC are investigating options of how to monitor compliance, but that it does not currently appear feasible to to check all of the submissions. BBSRC will monitor compliance, but will probably start with dipstick testing. They will look at historical projects and see where the process goes from there. In practice, this is likely to initially involve examining the degree of adherence to the submitted data management plans. If a researcher has acted reasonably and justified their mechanisms of data sharing, then it is unlikely that there would be any actions beyond noting where  difficulties had occurred.

Note, however that if a researcher has submitted a grant application with a data sharing statement there is a reasonable expectation to share the data.

Ultimately the data release will be policed. In areas where data sharing is prevalent, communities police themselves because researchers ask and expect the data to be available. In some cases you can’t publish without an accession number.

Michael noted there are places researchers can put information about published data into ResearchFish. ResearchFish is currently the only mechanism to capture information regarding post-award activities.

Where do we put the data?

The question arose about how other universities are managing the policy. Michael responded that many have started institutional repositories. The institutional response depends on where the majority of their research sits.

A possible solution for ensuring the data is discoverable would be a catalogue of what is stored in an institutional repository, with metadata about the data. That metadata would itself need to be discoverable. If the data is being held in a centralised repository it is possible to pay the cost upfront before the end of the grant.

The group noted there was a publishing preference for discipline specific repositories over institutional repositories because the community knows how to look after the work. These repositories are hosted by ‘people who know what they are doing’. They are discoverable, where the community can decide on the metadata and the required standards.

Michael agreed that the ideal was open discoverability. The question is what will be practically possible.

A way of considering the question is asking how would another researcher find the information? If the data is available from a researcher by request this should be noted in the paper. If it is available in a repository then the paper should state that. If the journal has told readers where the data is, then it should be self-evident.

Issues with obsolescence

Michael noted that there is an ongoing issue of obsolete data formats and disks. Given there are ideals and reality, it becomes a question of how to store and handle the information.

When data exists in a proprietary format, the researcher needs to think about how to access it in the longer term. What if the organisation goes out of business? Or the technology upgrades so you can’t get hold of the data in an earlier format? If data exists in a physical format then it is possible to go back and read it. However, if not then it is quite important to think about issues relating to long-term access. Lots of data will be obsolete.

There are some solutions for this issue. The Open Microscopy Environment is a joint project between universities, research establishments, industry and the software development community. It develops open-source software and data format standards for the storage and manipulation of biological microscopy data. This is a community-generated solution as a recognised problem. It has a database that you can upload any file format.

Issues with data sharing in the biological sciences

The BBSRC allows a reasonable embargo until the researcher has exploited the data for publication. If the researcher is planning on releasing further publications then they should consider carefully when to release the data., Michael noted, this is ‘not a forever thing’. The BBSRC do say there are reasonable limits, and some journals will expect data to be released alongside publications.

Commercial partners

Data emerging from BBSRC funded research needs to be shared unless there is a reason why not – and commercial partners who need to protect their intellectual property can be a good reason to delay data sharing. However once the Intellectual Property is protected, it is protected. The BBSRC allows researchers to embargo the data.

Michael also noted there are things that can be done with data, for example releasing it under license. An example is, if a researcher is working with a commercial partner who is concerned about other commercial competitors, it would be possible to require people to sign non-disclosure agreements. There are ways to deal with commercial data, as you would with other intellectual products.

It was noted by the researchers in the meeting that this type of arrangement is likely to mean the company doesn’t want to go through the process and won’t collaborate.

Exceptions

If data was generated before the policy was in place then the researcher has not submitted a grant application that requires them to share their data. The BBSRC is not expecting people to go back into history. Those researchers who wish to share historical research are not discouraged but this is not covered by the policy. The policy came into force in April 2007, however realistically it started in 2008.

In addition there are reasonable grounds for not sharing clearly incorrect or poor quality data. Many disciplinary databases will contain an element of quality control.   But Michael noted that the policy shouldn’t be a way for people to filter out inconvenient data and would expect the community to be self policing.

Future policy direction

Michael noted that this type of policy is becoming more prevalent not less. Open science is one of the Horizon 2020 themes – see the 2013 Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. Journals are getting involved as well. In the future sharing data will be more common – and driven by disciplinary norms. Anything that has been funded by RCUK will be required to share. It makes sense to government – the US National Institutes of Health and National Science Foundation have data sharing statements.

Continuing the dialogue

Michael indicated that he wants to talk to people about what the questions are so the BBSRC can refine issues in the policy.

Researchers who have questions about the policy can send them through to the Research Data Service team info@data.cam.ac.uk. If we are unable to answer them, we can ask BBSRC directly for clarification. We will then add the information to the University Research Data Management FAQ webpage.

Published 19 October 2015
Written by Dr Danny Kingsley, verified by Michael Ball, BBSRC
Creative Commons License

Half-life is half the story

This week the STM Frankfurt Conference was told that a shift away from gold Open Access towards green would mean some publishers would not be ‘viable’ according to a story in The Bookseller. The argument was that support for green OA in the US and China would mean some publishers will collapse and the community will ‘regret it’.

It is not surprising that the publishing industry is worried about a move away from gold OA policies. They have proved extraordinarily lucrative in the UK with Wiley and Elsevier each pocketing an extra £2 million thanks to the RCUK block grant funds to support the RCUK policy on Open Access.

But let’s get something straight. There is no evidence that permitting researchers to make a copy of their work available in a repository results in journal subscriptions being cancelled. None.

The September 2013 UK Business, Innovation and Skills Committee Fifth Report: Open Access stated “There is no available evidence base to indicate that short or even zero embargoes cause cancellation of subscriptions”. In 2012 the Committee for Economic Development Digital Connections Council in The Future of Taxpayer-Funded Research: Who Will Control Access to the Results? concluded that “No persuasive evidence exists that greater public access as provided by the NIH policy has substantially harmed subscription-supported STM publishers over the last four years or threatens the sustainability of their journals”

I am the first to say that we should address questions about how the scholarly publishing landscape is shifting with systematic data gathering, analysis and discussion. We need to look at trends over time and establish what they mean for the ongoing stability of the scholarly literary corpus. But consistently evoking the ‘green open access equals cancellation so we should have longer embargoes’ argument is not the solution.

Let’s put this myth to bed once and for all.

The half life argument

Publishers have been trying to use the half-life argument for some time to justify extending their embargo periods on the author’s accepted manuscript. Embargoes are how long after publication before the manuscript (the author’s Word or LaTeX document, usually saved as a pdf) can be made available in the author’s institutional or a subject-based repository.

The half life of an article is the time it takes for articles to reach half their total number of downloads.

The argument goes along the lines of ‘if articles have a longer half life then they should be kept under embargo for longer’ because, according to a blog published at the beginning of this year by Alice Meadows Open access at Elsevier 2014 in retrospect and a look at 2015: “If an embargo period falls too far below the period it takes for a journal to recoup its costs, then the journal’s survival will be jeopardized.”

The problem with this argument is that there has been, and continues to be, no evidence that permitting authors to make work available in a repository leads to journal cancellations. It is ironic that the consistent line on this issue from the publishers has been that the half–life argument is helping ‘set evidence-based policy settings of embargo periods’.

The half-life spectre was raised again at this week’s STM meeting by Philip Carpenter, executive vice president of research at Wiley where he noted that only 20% of Wiley journal usage occurred in the first 12 months after publication and referred to a 12 month embargo offering only ‘limited protection’ according to The Bookseller.

Evidence for the green = cancellation argument

The need for longer embargoes – 1

The way the ‘evidence’ for this argument has been presented is telling. There is a particular paragraph in Meadow’s blog that is worth republishing in full:

How long those embargo periods should be before manuscripts become publicly accessible is a key issue. To help set evidence-based policy settings of embargo periods, we have contributed to growing industry data. Findings of a recent usage study demonstrated that there is variation in usage half-lives both within and between disciplines. This finding aligned with a study by the British Academy, which also found variation in half-lives between disciplines – and half-lives longer than those previously suggested.

Despite looking like links to two separate items (which gives the impression of more ‘evidence’), the first two links in the section above to ‘industry data’ and to a ‘recent usage study’ both lead to the SAME November, 25, 2013 study by Phil Davis into journal half life usage that started the whole shebang off. The study looked at the usage patterns of over 2800 journals found that only 3% of the journals had half-lives of 12 months or less. The fewest journals with this short half-life were in the Life Sciences (1%) and the highest in engineering (6%).

While in no way criticising the findings of that study, it should be pointed out that the author clearly states that the study was funded by the Professional & Scholarly Publishing (PSP) division of the Association of American Publishers (AAP). The work has not been peer reviewed or published in the literature.

The British Academy report Open Access Journals in the Humanities and Social Sciences does not appear to be available online any longer.

Now, there is no dispute that there are differences in usage patterns of articles between disciplines. This is a reflection of differing communication norms and behaviours. But there is a huge logic jump to then conclude that therefore we need to increase embargo periods. Peter Suber went into some detail on 11 January 2014 (yes, we have been swinging around on this one for a while now) explaining the logical flaw in the argument. At the time Kevin Smith also noted in a blog “Half-lives, policies and embargoes” that “we should not accept anything that is presented as evidence just because it looks like data; some connection to the topic at hand must be proved”.

The need for longer embargoes – 2

Meadow’s blog went on to say:

There are real-world examples where embargo periods have been set too low and the journal has become unviable. For example, as published in the The Scholarly Kitchen, the Journal of Clinical Investigation lost about 40 percent of its institutional subscriptions after adopting a 0-month embargo period in 1996, so it was forced to return to a subscription model in 2009. Similar patterns have been seen with other journals.

The issue referred to here has nothing to do with the half life of research papers that are being made available open access through a repository. This refers to a journal that went to a GOLD Open Access model in 1996 (publishing open access and relying on non-subscription revenue sources), but eventually decided they needed to impose a subscription again in 2009. Not only is this example entirely unrelated to the embargo issue for green Open Access, it happened six years ago. Note the blog does not link to other ‘similar patterns’. They do not exist.

Green policies mean cancellations

The half-life argument has replaced previous, even less substantial ‘evidence’ provided by the publishing industry in 2012. The study was cited as evidence for the argument that “short embargo periods are likely to lead to significant cancellations” by Wiley in a 2013 blog post Open Access – Keeping it Real and by Springer in an interview published as Open Access – Springer tightens rules on self archiving.

The study was conducted by the Association of Learned and Professional Society Publishers (ALPSP). However the study, which was written up and published online had some major methodological issues. It consisted of a single poorly worded question:

“If the (majority of) content of research journals was freely available within 6 months of publication, would you continue to subscribe? Please give a separate answer for a) Scientific, Technical and Medical journals and b) Humanities, Arts and Social Sciences Journals if your library has holdings in both of these categories.”

An analysis of the study highlighted methodological criticisms. The work was not peer reviewed. But there are deeper questions about the motivation behind the survey. The researcher was the Chair of the ALPSP Research Committee and was on the steering committee for the Publishers Research Coalition, raising questions about her (and the study’s) objectivity. There are several other issues relating to the validity of the researcher.

What is the real problem?

There is no doubt that open access policies are causing disruption to publisher’s funding models. That is hardly surprising and in some cases may well be the intent of the policy. But presenting spurious arguments to try and maintain the status quo is not moving this discussion forward.

The point is we do need evidence. If green OA is causing cancellations then let’s collect some numbers and talk about the issues:

  • How does this affect the scholarly communication system?
  • What are the implications?
  • Does this mean publishers will fold (unlikely in the short term)?
  • Will some journals close (possibly)?
  • Is that a problem?
  • Perhaps we need to consider issues relating to the reward system and what is valued?

But I will give the last word to the person who caused me to write this blog in the first place – Philip Carpenter, executive vice-president of research at Wiley who, according to The Bookseller said at the STM meeting: “We’ll need to think hard about what factors influence library purchasing decisions; we don’t know enough [about that]”.

Hear, hear.

Published 16 October 2015
Written by Dr Danny Kingsley
Creative Commons License

Openness, integrity & supporting researchers

Universities need to open research to ensure academic integrity and adjust to support modern collaboration and scholarship tools, and begin rewarding people who have engaged in certain types of process rather than relying on traditional assessment schemes. This was the focus of Emeritus Professor Tom Cochrane’s* talk on ‘Open scholarship and links to academic integrity, reward & recognition’  given at Cambridge University on 7 October.

The slides from the presentation are available here: PRE_Cochrane_DisruptingDisincentives_V1_20151007

Benefits of an open access mandate

Tom began with a discussion about aspects of access to research and research data and why it should be as open as possible. Queensland University of Technology introduced an open access mandate 12 years or so ago. They have been able to observe a number of effects on bibliometric citation rates, such as the way authors show up in Scopus.

The other is the way consulting opportunities arise because someone’s research is exposed to reading audiences that do not have access to the toll-gated literature. Another benefit is the recruiting of HDR students.

Tom outlined six areas of advantage for institutions with a mandate – researcher identity and exposure, advantage to the institution. He noted that they can’t argue causation but can argue correlation, with the university’s. improvement in research performance. Many institutions have been able to get some advantage of having an institutional repository that reflects the output of the institution.

However in terms of public policy, the funders have moved the game on anyway. This started with private funders like Wellcome Trust, but also the public funding research councils. This is the government taxpayer argument, which is happening in the US.

Tom noted that when he began working on open access policy he had excluded books because there are challenges with open access when there is a return to the author, but there has been a problem long term with publishing in the humanities and the social sciences. He said there was an argument that there has been a vicious downward spiral that oppresses the discipline, by making the quality scholarship susceptible to judgements about sales appeal for titles in the market, assessments which may be unrelated. Now there is a new model called Knowledge Unlatched which is attempting to break this cycle and improve the number of quality long form outputs in Humanities and Social Sciences.

Nightmare scenarios

Tom started by discussing the correlation between academic integrity and research fraud by discussing the disincentives in the system. What are potential ‘nightmare’ scenarios?

For early career researcher nightmares include the PhD failing, being rejected for a job or promotion application, a grant application fails, industry or consultancy protocols fail or a paper doesn’t get accepted.

However a worse nightmare is a published or otherwise proclaimed finding is found to be at fault – either through a mistake or there is something more deliberate at play. This is a nightmare for the individual.

However it is very bad news for an institution to be on the front page news. This is very difficult to rectify.

Tom spoke about Jan Hendrik Schon’s deception. Schon was a physicist who qualified in Germany, went to work in Bell Labs in the US. He discovered ‘organic semiconductors’. The reviewers were unable to replicate the results because they didn’t have any access to the original data with lab books destroyed and samples damaged beyond recovery. The time taken to investigate and the eventual withdrawal of the research was 12.5 years, and the effort involved was extraordinary.

Incentives for institutions and researchers

Academics work towards recognition and renown, respect and acclaim. This is based on a system of dissemination and publication, which in turn is based on peer review and co-authorship using understood processes. Financial reward is mostly indirect.

Tom then discussed what structures universities might have in place. Most will have some kind of code of conduct to advise people about research misconduct. There are questions about how well understood or implemented this advice or knowledge about those kinds of perspectives actually are.

Universities also often provide teaching about authorship and the attribution of work – there are issues around the extent that student work gets acknowledged and published. Early career researchers are, or should be, advised about requirements in attributing work to others that have not contributed, as well as a good understanding of plagiarism and ethical conduct.

How does openness help?

Tom noted that we are familiar with the idea of open data and open access. But another aspect is ‘open process’. Lab work books for example, showing progress in thinking, approaches and experiments can be made open though there may be some variations in the timing of when this occurs.

The other pressing thing about this is that the nature of research itself is changing profoundly. This includes extraordinary dependence on data, and complexity requiring intermediate steps of data visualisation. In Australia this is called eResearch, in the UK it is called eScience. These eResearch techniques have been growing rapidly, and in a way that may not be understood or well led by senior administrators.

Using data

Tom described a couple of talks by early or mid career researchers at different universities. They said that when they started they were given access to the financial system, the IT and Library privileges. But they say ‘what we want to know are what are the data services that I can get from the University?’. This is particularly acute in the Life Sciences. Where is the support for the tools? What is the University doing by way of scaffolding the support services that will make that more effective for me? What sort of help and training will you provide in new ways of disseminating findings and new publishing approaches?

Researchers are notoriously preoccupied with their own time – they consider they should be supported better with these emerging examples. We need more systematic leadership in understanding these tools with a deliberate attention by institutional leadership to overcoming inertia.

The more sustained argument about things being made open relates to questions about integrity and trust – where arguments are disputes about evidence. What’s true for the academy in terms of more robust approaches to prevent or reduce inaccuracy or fraud, is also true in terms of broader public policy needs for evidence based policy.

Suggestions for improvement

We need concerted action by people at certain levels – Vice Chancellors, heads of funding councils, senior government bureaucrats. Some suggested actions for institutions and research systems at national and international levels include concerted action to:

  • develop and support open frameworks
  • harmonise supporting IP regimes
  • reframe researcher induction
  • improve data and tools support services
  • reward data science methods and re-use techniques
  • rationalise research quality markers
  • foster impact tracking in diverse tools

Discussion

Friction around University tools

One comment noted that disincentives at Cambridge University manifest as frictions around the ways they use the University tools – given they don’t want to waste time.

Tom responded that creating a policy is half the trick. Implementing it in a way that makes sense to someone is the other half. What does a mandate actually mean in a University given they are places where one does not often successfully tell someone else what to do?

However research and support tools are getting more efficient. It is a matter of marshalling the right expertise in the right place. One of the things that is happening is we are getting diverse uptakes of new ideas. This is reliant on the talent of the leadership that might be in place or the team that is in place. It could get held back by a couple of reactionary or unresponsive senior leaders. Conversely the right leadership can make striking progress.

Openness and competition

Another comment was how does openness square with researchers being worried about others finding about what they are doing in a competitive environment?

Tom noted that depending on the field, there may indeed need to be decision points or “gating” that governs when the information is available. The important point is that it is available for review for the reasons of integrity explored earlier. Exceptions will always apply as in the case of contract research being done for a company by an institution that is essentially “black box”. There would always have to be decisions about openness which would be part of working out the agreement in the first place.

Salami slicing publication

A question arose about the habit of salami slicing research into small publications for the benefits of the Research Excellence Framework and how this matches with openness.

Tom agreed that research assessment schemes need to be structured to encourage or discourage certain types of scholarly output in practice. The precursor to this practice was the branching of journal titles in the 1970s – the opportunity for advantage at the time was research groups and publishers. There has to be a leadership view from institutional management on what kind of practical limits there can be on that behaviour.

This sparked a question about the complexity of changing the reward system because researchers are judged by the impact factor, regardless of what we say to them about tweets etc. How could the reward system be changed?

Tom said the change would need to be that the view that reward is only based on research outputs is insufficient. Other research productivity needs reward. This has to be led. It can’t be a half-baked policy – put out by a committee. Needs to be trusted by the research community.

Open access drivers

A question was asked about the extent to which the compliance agenda that has been taken by the funders has led its course? Is this agenda going to be taken by the institutions.

Tom said that he has thought about this for a long time. He thought originally OA would be led by the disciplines because of the example of the High Energy Physics community which built a repository more than 20 years ago. Then there was considerable discussion, eg in the UK in early 2000s about aligning OA with institutional profile. But institutional take up was sporadic. In Australia in 2012 we only had six or seven universities with policies (which doesn’t necessarily mean there had been completely satisfactory take up in each of those).

Through that time the argument for a return on tax payer investment has become the prevalent government one. Tom doesn’t think they will move away from that, even though there has been a level of complexity relating to the position that might not have been anticipated, with large publishers keen to be embedded in process.

This moved to a question of whether this offers an opportunity for the institution beyond the mandate?

Tom replied that he always thought there was an advantage at an institutional and individual level that you would be better off if you made work open. The main commercial reaction has been for the large publishers to seek to convert the value that exists in the subscription market into the same level of value in input fees i.e, Article Processing Charges.

It should be understood finally that academic publishing and the quality certification for research does have a cost, with the question being what that level of cost should really be.

About the speaker

*Emeritus Professor Tom Cochrane was briefly visiting Cambridge from Queensland University of Technology in Australia. During his tenure as the Deputy Vice-Chancellor (Technology, Information and Learning Support), Professor Cochrane introduced the world’s first University-wide open access mandate, in January 2004. Amongst his many commitments Professor Cochrane serves on the Board of Knowledge Unlatched (UK) is a member of the Board of Enabling Open Scholarship (Europe) and was co-leader of the project to port Creative Commons into Australia.

Published 12 October 2015
Written by Dr Danny Kingsley
Creative Commons License

Archiving webpages – securing the digital discourse

We are having discussions around Cambridge about the research activity that occurs through social media. These digital conversations are the ephemera of the 21st century, the equivalent of the Darwin Manuscripts that the University has spent considerable energy preserving and digitising. However, to date we are not currently archiving or preserving this material.

As a starting point, we are sharing here some of the insights Dr Marta Teperek gained from attending the DPTP workshop on Web Archiving on 12 May 2015, led by Ed Pinsent and Peter Webster.

Digital dissemination

Increasingly researchers are realising that online resources are important to disseminate their findings – the subject of our recent blog ‘What is ‘research impact’ in an interconnected word?‘ It is common to use blogs and Twitter to share discoveries.

Some researchers even have dedicated websites to publish information about their research. In the era of Open Science webpages are also used to share research data, especially for programmers, who often use them as powerful tools for providing rich metadata description for their software. It is not uncommon to include a link to a webpage in publications as the source of additional information supporting a paper. In these cases, other researchers need to be able to cite to the webpage as it was at the time of publication. This ensures the content is stable – be it information, dataset, or a piece of software.

The question arises then about preventing ‘linkrot’ and preserving webpages – to ensure the content of a webpage is still going to be accessible (and unaltered) in several years’ time.

What does it mean to archive a webpage?

Archiving is preserving the exact copy of a webpage, as it is at a given moment in time. The most commonly used format for webpage archives are .warc files. These files contain all the information about the page: about its content, layout, structure, interactivity etc. They can be easily re-played to re-create the exact content of the archived webpage, as it was at the time of recording. These .warc files can be shared with colleagues or with the public by various means, for example, by preserving a copy in data repositories.

The right to archive

One of the most interesting topics emerging from almost every talk was who has the right to archive a webpage. The answer would seem simple – the webpage creator. However, webpages often contain information with reference to, or with input from various external resources. Most pages nowadays have feeds from Twitter, allow comments from external users, or have discussion fora. Does the website creator have the rights to archive all these?

In general, anyone can archive the page. Problems start if there are intentions to make the archive available to others – which is typically the driver for archiving the page in the first place. In theory, in order to disseminate the archived page, the archiver should ask all copyright owners of the content of that page for their consent. However, obtaining consent from all copyright owners might be impossible – imagine trying to approach authors of every single tweet on a given webpage.

The recommendation is that people should obtain consent for all elements of the webpage for which it is reasonably possible to get the consent. When making the archive available, there should also be a statement that the best effort was made to obtain consent from all copyright owners. It is good practice to ask any webpage contributors to sign a consent form for archiving and sharing of their contributed content.

Alternative approach to copyright

Some websites have decided to take an alternative approach to dealing with copyright. The Internet Archive simply archives everything, without worrying about copyright. Instead, they have a takedown policy if someone asks them to remove the shared archive. As a consequence of their approach, they are currently the biggest website archive in the world, which as of August 2014 used 50 PetaBytes of storage.

Anyone can archive their websites on the Internet Archive, simply by creating an account to upload the website in the Internet Archive, entering the URL of the webpage to be archived, clicking a button to archive the page, and it is done – the archive will be created and shared.

The workshop inspired us at Cambridge to archive the data website, which is now available on Internet Archive. Snapshots from each of the archiving events can be easily replayed by simply clicking on them.

Can a non-specialist archive the website?

But what if you would like to archive a website yourself – store and share it on your conditions, perhaps using a data repository? Various options for website preservation were discussed during the workshop.

As a non-specialist, the best option is the one which does not require any specialist knowledge, or specialist software installation. A startup company called WebRecorder have created a website which allows anyone to easily archive any page. There is no need to create an account. The user can simply copy the URL of the page to be archived and press ‘record’. This will generate a .warc file of the website.The disadvantage is this needs to be done for every page of the website separately. WebRecorder allows free downloads of .warc files – the files can be downloaded and archived/shared however the user chooses.

If anybody wants to then re-run the website from a .warc file, there are plenty of free software options available to re-play the webpage. Again, an easy solution for non-specialist is to go to WebRecorder. WebRecorder allows one to upload a .warc file and will then easily replay the webpage with a single click on the ‘Replay’ button.

A bouquet for the DPTP workshop

This was an excellent and extremely efficient one-day workshop, due to its dynamic organisation. The workshop was broken down into six main parts, and each of these parts consisted of several very short (usually 10 mins long) presentations and case studies directly related to the subject (no time to draw away!). After every short talk there was time for questions. Furthermore, there were breaks between the main parts of the workshop to allow focused discussions on the subject. This dynamic organisation ensured that every question was addressed, and that all issues were thematically grouped – which in return helped delivering powerful take-home messages from each section.

Furthermore the speakers (who by the way had expert knowledge on the subject) did not recommend any particular solutions, but instead reviewed types of solutions available, discussing their major advantages and disadvantages. This provided the attendees with enough guidance for making informed decisions about solutions most appropriate to their particular situations.

What also greatly contributed to the success of the workshop was the diverse background of attendees: from librarians and other research data managers, to researchers, museum website curators, and European Union projects’ archivists. All these people had different approaches, and different needs about web archiving. Perhaps this is why the breakout sessions were so valuable and deeply insightful.

Published 3 October 2015
Written by Dr Marta Teperek and Dr Danny Kingsley
Creative Commons License

Joint response on the draft UK Concordat on Open Research Data

During August the Research Councils UK on behalf of the UK Open Research Data Forum released a draft Concordat on Open Research Data for which they have sought feedback.

The Universities of Bristol, Cambridge, Manchester, Nottingham and Oxford prepared a joint response which was sent to the RCUK on 28 September 2015. The response is reproduced below in full.

The initial main focus of the Concordat should be good data management, instead of openness.

The purpose of the Concordat is not entirely clear. Merely issuing it is unlikely to ensure that data is made openly available. If Universities and Research Institutes are expected to publicly state their commitment to the Principles then they risk the dissatisfaction of their researchers if insufficient funds are available to support the data curation that is described. As discussed in the Comment #5 below, sharing research data in a manner that is useful and understandable requires putting research data management systems in place and having research data experts available from the beginning of the research process. Many researchers are only beginning to implement data management practices.It might be wiser to start with a Concordat on good data management before specifying expectations about open data. It would be preferable to first get to a point where researchers are comfortable with managing their data so that it is at least able to be citeable and discoverable. Once that is more common practice, then the openness of data can be expected as the default position.

The scope of the Concordat needs to be more carefully defined if it is to apply to all fields of research.

The Introduction states that the Concordat “applies to all fields of research” but it is not clear how the first sentence of the Introduction translates for researchers in the Arts and Humanities, (or in theoretical sciences, e.g. Mathematics). This sentence currently reads:

“Most researchers collect, measure, process and analyse data – in the form of sets of values of qualitative or quantitative variables – and use a wide range of hardware and software to assist them to do so as a core activity in the course of their research.”

The Arts and Humanities are mentioned in Principle #1, but this section also refers to benefits in terms of “progressing science”. We suggest that more input is sought specifically from academics in the Arts and Humanities, so that the wording throughout the Concordat is made more inclusive (or indeed exclusive, if appropriate).

The definition of research data in the Concordat needs to be relevant to all fields of research if the Concordat is to apply to all fields of research.

We suggest that the definition of data at the start of the document needs to be revised if it is to be inclusive of Arts and Humanities research (and theoretical sciences, e.g. Mathematics). The kinds of amendments that might be considered are indicated in italics:

Research Data can be defined as evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical forms). These might be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, interview or other methods, or information derived from existing evidence. Data may be raw or primary (e.g. direct from measurement or collection) or derived from primary data for subsequent analysis or interpretation (e.g. cleaned up or as an extract from a larger data set), or derived from existing sources where the copyright may be externally held. The purpose of open research data is not only to provide the information necessary to support or validate a research project’s observations, findings or outputs, but also to enable the societal and economic benefits of data reuse. Data may include, for example, statistics, collections of digital images, software, sound recordings, transcripts of interviews, survey data and fieldwork observations with appropriate annotations, an interpretation, an artwork, archives, found objects, published texts or a manuscript.

The Concordat should include a definition of open research data.

To enable consistent understanding across Concordat stakeholders, we suggest that the definition of research data at the start of the document be followed by a definition of “openness” in relation to the reuse of data and content.

To illustrate, consider referencing The Open Definition which includes the full Open Definition, and presents the most succinct formulation as:

“Open data and content can be freely used, modified, and shared by anyone for any purpose”.

The Concordat refers to a process at the end of the research lifecycle, when what actually needs to be addressed is the support processes required before that point to allow it to occur.

Principle #9 states that “Support for the development of appropriate data skills is recognised as a responsibility for all stakeholders”. This refers to the requirement to develop skills and provision of specialised researcher training. These skills are almost non-existent and training does not yet exist in any organised form (as noted by Jisc in March this year). There is some research data management training for librarians provided by the Digital Curation Centre (DCC) but little specific training for data scientists. The level of researcher support and training required across all disciplines to fulfil expectations outlined in Principle #9 will require a significant increase in both the infrastructure and staffing.

The implementation of, and integration between research data management systems (including systems external to institutions) is a complex process, and is an area of ongoing development across the UK research sector and will also take time for institutions to establish. This is reflected by the final paragraphs of DCC reports on the DCC RDM 2014 Survey and discussions around gathering researcher requirements for RDM infrastructure at the IDCC15 conference of March this year. It is also illustrated by a draft list of basic RDM infrastructure components developed through a Jisc Research Data Spring pilot.

The Concordat must acknowledge the distance between where the Higher Education research sector currently stands and the expectation laid out. While initial good progress towards data sharing and openness has been made in the UK, it will require further substantial culture change to enact the responsibilities laid out in Principle #1 of the Concordat, and this should be recognised within the document. There will be a significant time lag before staff are in place to support the research data management process through the lifecycle of research, so that the information is in a state that it can be shared at the end of the process.

We suggest that the introduction to the Concordat should include text to reflect this, such as:

“Sharing research data in a manner that is useful and understandable requires putting integrated research data management systems in place and having research data experts available from the beginning of the research process. There is currently a deficit of knowledge and skills in the area of research data management across the research sector in the UK. This Concordat is intended to establish a set of expectations of good practice with the goal of establishing open research data as the desired position over the long term. It is recognised that this Concordat describes processes and principles that will take time to establish within institutions.”

The Concordat should clarify more clearly its scope in relation to publicly funded research data and that funded from alternative sources or unfunded.

While the Introduction to the Concordat makes clear reference to publicly-funded research data, Principle #1 states that ‘it is the linking of data from a wide range of public and commercial bodies alongside the data generated by academic researchers’ that is beneficial. In addition, the ‘funders of research’ responsibilities should state whether these responsibilities relate only to public bodies, or wider (Principle #1).

The Concordat should propose sustainable solutions to fund the costs of the long-term preservation and curation of data, and how these costs can be borne by different bodies.

It is welcome that the Concordat states that costs should not fall disproportionately on a single part of the research community. However, currently the majority of costs are placed on the Higher Education Institutions (HEIs) which is not a sustainable position. There should be some clarification of how these costs could be met from elsewhere, for example research funders. In addition an acknowledgement that there will be a transition period where there may be little or no funding to support open data which will make it very difficult for HEIs to meet responsibilities in the short to medium term should be included. Furthermore, Principle #1 says that “Funders of Research will support open research data through the provision of appropriate resources as an acknowledged research cost.” It must be noted that several funders are at present reluctant or refusing to pay for the long-term preservation and curation of data.

The Concordat should propose solutions for paying for the cost of the long-term preservation and curation of data in cases where the ‘funders of research’ refuse to pay for this, or where research is unfunded. In the second paragraph of Principle #4 it is suggested that “…all parties should work together to identify the appropriate resource provider”. It would be useful to have some clarification about what the Working Group envisaged here. For example was it a shared national repository? Perhaps the RCUK (in collaboration with other UK funding bodies) could consider setting up a form of UK Data Service that meets the wider funding body audience for data of long-term value. This would also support the nature of collaboration and enable more re-use by increased data discoverability – data will not be stored at separate institutional repositories.

Additionally, there appears to be a contradiction between the statement in Principle 1 that “Funders of Research will support open research data through the provision of appropriate resources as an acknowledged research cost” and the statement in Principle #4: “…the capital costs for infrastructure may be incorporated into planned upgrades” which suggests that Universities or Research Institutes will need to fund infrastructure and services from capital and operational budgets.

The Concordat should clarify how an appropriate proportionality between costs and benefits might be assessed.

Principle #4 states that: “Such costs [of open research data] should be proportionate to real benefits.” This key relationship needs further amplification. How and at what stage can “real benefits” be determined in order to assess the proportionality of potential costs? The Concordat should state more clearly the ‘real and achievable’ benefits of open data with examples. What is the relationship between the costs and the benefits? Has this relationship been explored? The real benefits of sharing research data will only become clear over time. At the moment it is difficult to quantify the benefits without evidence from the open datasets. Moreover, there might be an amount of time after a project is finished before the real benefits are realised. Are public funders going to put in monetary support for such services?

Additionally, the Concordat should specify to what extent research data should be made easily re-usable by others. Currently Principle #3 mentions: “Open research data should also be prepared in such a manner that it is as widely useable as is reasonably possible…”. What is the definition of “reasonably possible”? Preparing data for use by others might be expensive, depending on the complexity of the data, and should be also taken into consideration when assessing the proportionality of potential costs of data sharing. Principle #4 states: “Both IT infrastructure costs and the on-going costs of training for researchers and for specialist staff, such as data curation experts, are expected to be significant over time.” These costs are indeed significant from the outset.

The Concordat (Principle #2) states: “A properly considered and appropriate research data management strategy should be in place before the research begins so that no data is lost or stored inappropriately. Wherever possible, project plans should specify whether, when and how data will be will be made openly available.” The Concordat should propose a process by which a proposal for data management and sharing in a particular research context is put forward for public funding. This proposal will need to include the cost-benefit-analysis for deciding which data to keep and distribute (and how best to keep and distribute it).

In general, the Concordat must balance open data requirements with allowing researchers enough time, and space to pursue innovation.

The Concordat should acknowledge the costs relating to undertaking regular reviews of progress towards open data.

Principle #4 refers to the following costs:

  • “necessary costs – for IT infrastructure and services, administrative and specialist support staff, and for researchers’ time – are significant”
  • “the additional and continuing revenue costs to sustain services – and rising volumes of data – for the long term are real and substantial”
  • “Both IT infrastructure costs and the on-going costs of training for researchers and for specialist staff, such as data curation experts, are expected to be significant over time”

However, there is no explicit reference to costs relating to Principle #10 regarding “Regular reviews of progress towards open access to research data should be undertaken”.

We suggest that Principle #4 should include text to reflect this, and the kind of amendment that might be considered is indicated in italics:

For research organisations such as universities or research institutes, these costs are likely to be a prime consideration in the early stages of the move to making research data open. Both IT infrastructure costs and the on-going costs of training for researchers and for specialist staff, such as data curation experts, are expected to be significant over time. Significant costs will also arise from Principle #10 regarding the undertaking of regular reviews of progress towards open access to research data.

The Concordat should explore the establishment of a central organisation to lead the transformation towards a cohesive UK research data environment.

Principle #3 states: “Data must be curated […] This can be achieved in a number of ways […] However, these methodologies may vary according to subject and disciplinary fields, types of data, and the circumstances of individual projects. Hence the exact choice of methodology should not be mandated”.

Realising the benefits of curation may have significant costs where curation extends over the long term, such as data relating to nuclear science which may need to be usable for at least 60 years. These benefits would be best achieved, and in a cost-effective manner, through the establishment of a central organisation that will lead the creation of a cohesive national collection of research resources and a richer data environment that will:

  • Make better use of the UK’s research outputs
  • Enable UK researchers to easily publish, discover, access and use data
  • Develop discipline-specific guidelines on data and metadata standards
  • Suggest discipline-specific curation and preservation policies
  • Develop protocols and processes for the access to restricted data
  • Enable new and more efficient research

In Australia this capacity is provided by the Australian National Data Service.

The Concordat should address the issues around sharing research data resulting from collaborations, especially international collaborations.

It has to be explicitly recognised that some researchers will be involved in international collaborations, with collaborators who are not publicly funded, or whose funders to do not require research data sharing. Procedures (and possible exemptions) for sharing of research data in such circumstances should be discussed in the Concordat.

Additionally, the Concordat should suggest a sector-wide approach when considering the costs and complexities of research involving multiple institutions. Currently where multiple institutions are producing research data for one project there is a danger that it is deposited in multiple repositories which is neither pragmatic nor cost-effective.

Non-public funders need to be consulted about sharing of commercially-sponsored data, and the Concordat should acknowledge the possibility of restricting the access to research data resulting from commercial collaborations.

Since the Concordat makes recommendations with regards to making commercially-sponsored data accessible, significant conversation with non-public funders are needed. Otherwise, there is a risk that the expectations on industry are unlikely to be met. The current wording could damage industrial appetite to fund academic research if they are pushed towards openness without major consultation.

We also suggest that in the second paragraph of Principle #5, the sentence: “There is therefore a need to develop protocols on when and how data that may be commercially sensitive should be made openly accessible, taking account of the weight and nature of contributions to the funding of collaborative research projects, and providing an appropriate balance between openness and commercial incentives.” is changed to “There is therefore a need to develop protocols on whether, when and how data that may be commercially sensitive should be made openly accessible, taking account of the weight and nature of contributions to the funding of collaborative research projects, and providing an appropriate balance between openness and commercial incentives.” The Concordat should also recognise that development and execution of these processes is an additional burden on institutional administrative staff which must not be underestimated.

The Concordat should more generally recognise the increasing economic value of data produced by researchers.

Where commercial benefits can be quantified (such as the return on investment of a research project) this should be recognised as a reason to embargo access to data until such things as patents can be successfully applied. University bodies charged with the commercialization of research should be entitled to assess the potential value of research before consenting to data openness.

The Concordat should allow the use of embargo periods to allow release of data to be delayed up to a certain time after publication, where this is appropriate and justifiable.

The Concordat expects research data underpinning publications to be made accessible by the publication date (Principles #6 and #8). This does not, however, take into account disciplinary norms, where sometimes access to research data is delayed until a specified time after publication. For example, in crystallography (Protein Data Bank) the community has agreed a maximum 12-month delay between publishing the first paper on a structure and making coordinates public for secondary use. Delays in making data accessible are accepted by funders. For example, the BBSRC allows exemptions for disciplinary norms, and where best practices do not exist BBSRC suggests release within three years of generation of the dataset; the STFC expects research data from which the scientific conclusions of a publication are derived to be made available within six months of the date of the relevant publication. Research data should be discoverable at the time of publication, but it may be justifiable to delay access to the data.

The Concordat should make mention of the difficulties involved with ethical issues of data sharing, including issues around data licensing, and data use by others.

Ethical issues surrounding release and use of research data are briefly mentioned in Principle #5 and Principle #7. We believe the Concordat could benefit from expansion on the ethical issues surrounding release and use of research data, and advice on how these can be addressed in data sharing agreements. This is a large and complex area that would benefit from a national framework of best practice guidelines and methods of monitoring.

Furthermore, the Concordat does not provide any recommendations about research data licensing. This should be discussed together with issues about associated expertise required, costs and time. It is mentioned briefly above in point 4.

The Concordat’s stated expectations regarding the use of non-proprietary formats should be realistic.

Principle #3 states that:

“Open research data should also be prepared in such a manner that it is as widely useable as is reasonably possible, at least for specialists in the same or linked fields, wherever they are in the world. Any requirement to use specialised software or obscure data manipulations should be avoided wherever possible. Data should be stored in non-proprietary formats wherever possible, or the most commonly used proprietary formats if no equivalent non-proprietary format exists.”

The last two sentences of this paragraph could be regarded as unreasonable, depending on the definition of what is ‘possible’. It might theoretically be possible to convert data for storage but not remotely cost-effective. Other formulations (e.g. from EPSRC) talk about the burden of retrieval from stored formats being on the requester not the originator of the data.

We suggest that this section should be rephrased in-line with EPSRC recommendations, for example:

“Wherever possible, researchers are encouraged to store research data in non-proprietary formats. If this is not possible (or not cost-efficient), researchers should indicate what proprietary software is needed to process research data. Those requesting access to data are responsible for re-formatting it to suit their own research needs and for obtaining access to proprietary third party software that may be necessary to process the data.”

The Concordat should encourage proper management of physical samples and non-digital research data.

The Concordat should also encourage proper management of physical samples, and other forms of non-digital research data. Physical samples such as fossils, core samples, zoological and botanical samples, and non-digital research data such as recordings, papers notes, etc. should be also properly managed. In some areas the management and sharing of these items is well constructed and understood – for example, palaeontology journals will not allow people to publish without the specimen numbers from a museum – but it is less rigid in other areas of research. It would be desirable if the Concordat would encourage development of discipline-specific guidelines for management of physical samples and other non-digital research data.

Principle #5 must recognise the culture change required to remove the decision to share data from an individual researcher.

Principle #5 states that:

“‘Decisions on withholding data should not generally be made by individual researchers but rather through a verifiable and transparent process at an appropriate institutional level.”

Whilst the reasoning behind this Principle is understandable, it must recognise that we are not yet in a mature culture of data sharing and a statement removing data sharing decisions from the researcher will need changes in workflows and more importantly culture and autonomy of the researchers.

The idea that open research data should be formally acknowledged as a legitimate output of the research should form a separate principle.

The last paragraph of Principle #6 states that open research data should be acknowledged as a legitimate output of the research and that it “…should be accorded the same importance in the scholarly record as citations of other research objects, such as publications”. We strongly support this idea but recognise that this is a fundamental shift in working practices and policies. We are probably still several years off from seeing formal citation of datasets as an embedded practice for researchers and the development of products/services around the resulting metrics. This point is completely separate from the rest of Principle #6 and should form a principle in its own right.

Principle #2 must recognise that it may take significant resource for institutions to provide the infrastructure required for good data management.

While the focus of this Principle on good data management through the lifecycle, rather than the focus on open data sharing, is welcome, there are significant human, technical and sociotechnical developments required to meet this requirement; and also resources, in terms of people, time and infrastructure, that will be needed to shift to a mature position. These needs should be recognised in the Concordat.

The Concordat should clarify the reference to “other workers” in Principle #7

We would value some clarification on paragraph 3 of Principle #7 in relation to the reference to “other workers”: “Research organisations bear the primary responsibility for enforcing ethical guidelines and it is therefore vital that such guidelines are amended as necessary to make clear the obligations that are inherent in the use of data gathered by other workers.”

What is ‘research impact’ in an interconnected world?

Perhaps we should start this discussion with a definition of ‘impact’. The term impact is used by many different groups for different purposes, and much to the chagrin of many researchers it is increasingly a factor in the Higher Education Funding Councils for England’s (HECFE) Research Excellence Framework. HEFCE defined impact as:

‘an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia’.

So we are talking about research that affects change beyond the ivory tower. What follows is a discussion about strengthening the chances of increasing the impact of research.

Is publishing communicating research?

Publishing a paper is not a good way of communicating work. There is some evidence that much published work is not read by anyone other than the reviewers. During an investigation of claims that huge numbers of papers were never cited, Dahlia Remler found that:

  • Medicine  – 12% of articles are not cited
  • Humanities – 82% of articles are not cited – note however that their prestigious research is published in books, however many books are rarely cited too.
  • Natural Sciences – 27% of articles are never cited
  • Social Sciences – 32% of articles are never cited

Hirsch’s 2005 paper: An index to quantify and individual’s scientific research output, proposing the h index – defined as the number of papers with citation number ≥h. So an h index of 5 means the author has at least 5 papers with at least 5 citations. Hirsch suggested this as a way to characterise the scientific output of researchers. He noted that after 20 years of scientific activity, an h index of 20 is a ‘successful scientist’. When you think about it, 20 other researchers are not that many people who found the work useful. And that ignores those people who are not ‘successful’ scientists who are, regardless, continuing to publish.

Making the work open access is not necessarily enough

Open access is the term used for making the contents of research papers publicly available – either by publishing them in an open access journal or by placing a copy of the work in a subject or institutional repository. There is more information about open access here.

I am a passionate supporter of open access. It breaks down cost barriers to people around the world, allowing a much greater exposure of publicly funded research. There is also considerable evidence showing that making work open access increases citations.

But is making the work open access enough? Is a 9.5MB pdf downloadable onto a telephone, or through a dail-up connection?  If the download fails at 90% you get nothing. Some publishing endeavours have recogised this as an issue, such as the Journal of Humanitarian Engineering (JHE), which won the Australian Open Access Support Group‘s 2013 Open Access Champion award for their approach to accessibility.

Language issues

The primary issue, however is the problem of understandability. Scientific and academic papers have become increasingly impenetrable as time has progressed. It’s hard to believe now that at the turn of last century scientific articles had the same readability as the New York Times.

‘This bad writing is highly educated’ is a killer sentence from Michael Billig’s well researched and written book ‘Learn to Write Badly: How to Succeed in the Social Sciences‘.  This phenomenon is not restricted to the social sciences, specialisation and a need to pull together with other members of one’s ‘tribe‘ mean that academics increasingly write in jargon and specialised language that bears little resemblance to the vernacular.

There are increasing arguments for scientific communication to the public being part of formal training. In a previous role I was involved in such a program through the Australian National Centre for the Public Awareness of Science. Certainly the opportunities for PhD students to share their work more openly have never been more plentiful. There are many three minute thesis competitions around the world. Earlier this year the British Library held a #Share your thesis competition where entrants were first asked to tweet why their PhD research is/was important using the hashtag #ShareMyThesis. The eight shortlisted entrants were asked to write a short article (up to 600 words) elaborating on their tweet and explaining why their PhD research is/was important  in an engaging and jargon-free way.

Explaining work in understandable language is not ‘dumbing it down’.  It is simply translating it into a different language. And students are not restricted to the written word. In November the eighth winner of the annual ‘Dance your PhD‘ competition sponsored by Science, Highwire Press and the AAAS will be announced.

Other benefits

There is a flow-on effect from communicating research in understandable language. In September, the Times Higher Education recently published an article ‘Top tips for finding a permanent academic job‘ where the information can be summarised as ‘communicate more’.

The Thinkable.org group’s aim is to widen the reach and impact of research projects using short videos (three minutes or less). The goal of the video is to engage the research with a wide audience. The Thinkable Open Innovation Award is a research grant that is open to all researchers in any field around the world and awarded openly by allowing Thinkable researchers and members to vote on their favourite idea. The winner of the award receives $5000 to help fund their research. This is specifically the antithesis of the usual research grant process where grants “are either restricted by geography or field, and selected via hidden panels behind closed doors”.

But the benefit is more than the prize money. This entry from a young Uni of Manchester PhD biomedical student did not win, but thousands of people engaged in her work in just few weeks of voting.

Right. Got the message. So what do I need to do?

Researcher Mike Taylor pulled together a list of 20 things a researcher needs to do when they publish a paper.  On top of putting a copy of the paper in an institutional or subject repository, suggestions include using various general social media platforms such as Twitter and blogs, and also uploading to academic platforms.

The 101 Innovations in Scholarly Communication research project run from the University of Utrecht is attempting to determine scholarly use of  communication tools. They are analysing the different tools that researchers are using through the different phases of the research lifecycle – Discovery, Analysis, Writing, Publication, Outreach and Assessment through a worldwide survey of researchers. Cambridge scholars can use a dedicated link to the survey.

There are a plethora of scholarly peer networks which all work in slightly different ways and have slightly different foci.  You can display your research into your Google Scholar or CarbonMade profile. You can collate the research you are finding into Mendeley or Zotero. You can also create an environment for academic discourse or job searching with Academia.edu, ResearchGate and LinkedIn. Other systems include Publons – a tool to register peer reviewing activity.

Publishing platforms include blogging (as evidenced here), Slideshare, Twitter, figshare, Buzzfeed. Remember, this is not about broadcasting. Successful communicators interact.

Managing an online presence

Kelli Marshall from DePaul University asks ‘how might academics—particularly those without tenure, published books, or established freelance gigs—avoid having their digital identities taken over by the negative or the uncharacteristic?’

She notes that as an academic or would-be academic, you need to take control of your public persona and then take steps to build and maintain it. If you do not have a clear online presence, you are allowing Google, Yahoo, and Bing to create your identity for you. There is a risk that the strongest ‘voices’ will be ones from websites such as Rate My Professors.

Digital footprint

Many researchers belong to an institution,  a discipline and a profession. If these change your online identity associated with them will also change. What is your long term strategy? One thing to consider is obtaining a persistent unique identifier such as an ORCID – which is linked to you and not your institution.

When you leave an institution, you not only lose access to the subscriptions the library has paid for, you also lose your email address. This can be a serious challenge when your online presence in academic social media sites like Academia.edu and ResearchGate are linked to that email address. What about content in a specific institutional repository? Brian Kelly discussed these issues at a recent conference.

We seem to have drifted a long way from impact?

The thing is that if it can be measured it will be. And digital activity is fairly easily measured. There are systems in place now to look at this kind of activity. Altmetrics.org moves beyond the traditional academic internal measures of peer review, Journal Impact Factor (JIF) and the H-index. There are many issues with the JIF, not least that it measures the vessel, not the contents. For these reasons there are now arguments such as the San Francisco Declaration on Research Assessment (DORA) which calls for the scrapping of the JIF to assess a researcher’s performance. Altmetrics.org measures the article itself, not where it is published. And it measures the activity of the articles beyond academic borders. To where the impact is occurring.

So if you are serious about being a successful academic who wants to have high impact, managing your online presence is indeed a necessary ongoing commitment.

NOTE: On 26 September, Dr Danny Kingsley spoke on this topic to the Cambridge University Alumni festival. The slides are available in Slideshare. The Twitter discussion is here.

Published 25 September 2015
Written by Dr Danny Kingsley
Creative Commons License

It’s time for open access to leave the fringe

The Repository Fringe was held in Edinburgh on 3-4 August. With the theme of “Integrating repositories in the wider context of university, funder and external services”, the event brought together repository managers across the UK to discuss practice and policy. Dr Arthur Smith, Open Access Research Advisor at the University of Cambridge, attended the event and came away with the impression that more needs to be done to embed open access in scholarly processes.

In his keynote speech to Repository Fringe 2015, titled ‘Fulfilling their potential: is it time for institutional repositories to take centre stage?’  David Prosser, Executive Director of Research Libraries UK (RLUK) gave a concise overview of the history surrounding open access and the situation we currently find ourselves in, especially in the UK.

What’s become clear is that ‘we’ is a problematic term for the scholarly communications community. A lack of cohesion and vision between librarians, repository managers and administrators means ‘we’ have failed to engage with researchers to make the case for open access.

I feel this is due to, in part, the fragmented nature of repositories stemming from an institutional need for control. If national (and international) open access subject repositories had been created and exploited perhaps researcher uptake of open access in the UK and around the world would have been faster. For example, arXiv continues to be the one stop shop for physicists to publish their manuscripts precisely because it’s the repository for the entire physics community. That’s where you go if you’ve got a physics paper. To be fair, physics had a culture of sharing research papers that predates the internet.

Repositories are only as good as the content they hold, and without support from the academic community to fill repositories with content, there is a risk of side-lining green open access*. This will in turn increase the pressure to justify the cost of ineffective institutional repositories.

As David correctly identified, scholars will happily take the time to do things they feel are important. But for many researchers open access remains a low priority and something not worth investing their time in. Repositories are only capturing a fraction of their institution’s total publication output. At Cambridge we estimate that only 25-30% of articles are regularly deposited.

Providing value

The value of open access, whether it’s green or gold**, isn’t obvious to the authors producing the content. Yet juxtaposed with this is a report prepared by Nature Publishing Group on 13 August: Perceptions of open access publishing are changing for the better. This examined the changing perceptions of researchers to open access. While many researchers are still unaware of their funders’ open access requirements, the general perception of open access journals in the sciences has changed significantly, from 40% who were concerned about the quality of OA publication in 2014, to just 27% in 2015.

Clearly the trend is towards greater acceptance of open access within the academic community, but actual engagement remains low. If we don’t want to end up in a world of expensive gold open access journals, green repositories must be competitive with slick journal websites. Appearances matter. We need to attract the attention of the academics so that open access repositories are seen as viable places for disseminating research.

The scholarly communications community must find new ways of making open access (particularly green open access) appealing to researchers. One way forward is to augment the reward structure in academic publishing. Until open access is adopted more widely, academics should be rewarded for the effort involved in making their work openly available.

In the UK, failure to comply with the Higher Education Funding Council for England (HEFCE) and other funders’ policies could seriously affect future funding outcomes. It is the ever-present threat of funding cuts which drives authors to choose open access options, but this has changed open access into a policy compliance debacle.

Open access as a side effect of policy compliance is not enough; we need real support from academics to propel open access forward.

Measuring openness

As a researcher, the main things I look for when assessing other researchers and their publications are h-index, total and article level citations, and journal prestige (impact factor). I am not aware of any other methods which so simply define an author’s research.

While these types of metrics have their problems, they are nonetheless widely used within the academic community. An annual openness index, which is simply the ratio of open access articles to the total number of publications, would quickly reveal how open an academic’s research publications are. This index could be applied equally to established professors and early career researchers, as unlike the h-index, there is no historical weighting. It only depends on how you’re publishing now.

Developing such a metric would spur on open access from within academic circles by making open access publishing a competition between researchers. Perhaps the openness index could also be linked to university progression and grant reward processes. The more open access your work is, the better it is for you, and as a consequence, the community.

Open access needs to stop being a ‘fringe’ activity and become part of the mainstream. It shouldn’t be an afterthought to the publication process. Whether the solution to academic inaction is better systems or, as I believe, greater engagement and reward, I feel that the scholarly communications and repository community can look forward to many interesting developments over the coming months and years.

However, we must not be distracted from our main goal of engaging with researchers and academics to gather content for the open access repositories we have so lovingly built.

Glossary

*Green open access refers to making a copy of a published work available by placing it in a repository. This can be thought of as ‘secondary’ open access.

**Gold open access is where the research is published either in a fully open access journal – which sometimes incurs an article processing charge, or in a hybrid journal – which imposes an article processing charge to make that particular article available and also charges a subscription for the remainder of the articles in the journal. This can be thought of as ‘born’ open access.

Published 27 August 2015
Written by Dr Arthur Smith
Creative Commons License

Data sharing – build it and they will come

If a tree falls in the forest and no one was there to hear it, did it happen? You could ask the same philosophical question of research – if no-one can see the research results, what was the point in the first place?

Moving science forward and increasing the knowledge of the world around implies exchange of findings. Society cannot benefit from research if there is no awareness of what has been done. Managing and sharing research data is a fundamentally important part of the research process. Yet researchers are often reluctant to share their data, and some are openly hostile to the idea.

This blog describes the research data services provided at Cambridge University which are attempting to encourage and assist researchers manage and share their data.

A tough start

The Data Management Facility project at Cambridge began operations in January 2015. At the time there was very little user support for data management in place.  There was no advocacy, no training and no centralised tools to support researchers in research data management.

There had been a substantial body of work undertaken in 2010-2012 as part of the ‘Incremental’ project into research data management, but once the project money ended, the resources remained available but were not updated.

One of the initial challenges was an out of date institutional repository. Cambridge University was one of the original test-bed institutions for DSpace in 2005. While there had been considerable effort invested in the establishment of the repository, it had in recent years been somewhat neglected. The lack of both awareness of the repository and support for researchers was reflected in the numbers: during the first decade of the repository, only 72 datasets had been deposited.

In addition, the Engineering and Physical Sciences Research Council (EPSRC) had compliance expectations for funded research kicking in May 2015. This gave us five months to pull the Research Data Facility together. It was a tough start.

Understanding researchers’ needs

Tight deadlines often mean the temptation is to create short-term solutions. But we did not want to take this path. Solutions created without prior understanding of the need have no guarantee they will resolve the actual issues at hand.

So we started talking with researchers. We met and spoke with hundreds of researchers across all disciplines and fields of study – Principal Investigators, postdocs, students, and staff members. These were both group sessions and individual meetings. We told them about the importance of sharing research data, and in return we listened to what researchers told us about their worries and possible problems with data sharing.

To date, we have spoken with over 1000 researchers, and from each meeting we kept detailed notes of all the questions/comments received.

We have additionally conducted a questionnaire to better understand researchers’ needs for research data management support. Of the researchers surveyed, 83% indicated that it is ‘very useful’ for the University to provided both information about funders’ expectations for research data sharing and management, and support.

Screen Shot 2015-08-24 at 06.45.55

Solution 1 – Providing information

In March 2015 we launched the Research Data Management website which is a single location for solutions to all research data management needs. The website contains:

and much more.

The key idea behind the website is to provide an easy to navigate place with all necessary information. The website is being constantly updated, and new information is regularly added in response to feedback received from researchers.

Concurrently we have been conducting tailored information sessions about funders’ requirements for sharing data and support available at the University of Cambridge. We run these sessions at multiple locations across the University, and to audiences of various types. The sessions ranged from open sessions in central locations to dedicated sessions hosted at individual departments, and speaking with individual research groups. Slides from information sessions are always made available for attendees to download.

Solution 2 – Assistance with data management plans and supporting data management

In the survey 82% of researchers said it would be very helpful if there were someone at the University available to help with data management plans. To address this, we have:

  • Added tailored information about data management plans to our information sessions.
  • Linked the DMPonline tool from our data website. This allows researchers to prepare funder specific data management plans
  • Organised data management plan clinic sessions (one to one appointments on demand)
  • Prepared guidelines on how to fill in a data management plan.

Additionally, 63% researchers indicated that it would be ‘very useful’, and further 31% indicated that it would be ‘useful’ to have workshops on research data management. We have therefore prepared a 1.5 hour interactive introductory workshop to research data management, which is now offered across various departments across the University. We are also developing the skill sets within the library staff across the institution to deliver research data management training to researchers from their field.

Solution 3 – Providing an institutional repository

Finally, 79% of researchers indicated that it would make data sharing easier if the University maintained its own, easy to use data repository. We therefore had to do something about our repository, which had not been updated for a long time. We have rolled-out series of updates to the repository, taking it to Version 4.3, which will allow minting DOIs to datasets.

Meantime we also had to think of a strategy to make data sharing as easy as possible. The existing processes for uploading research data to the repository were very complicated and discouraging to researchers. We did not have any web-mediated facility that would allow researchers to easily get their data to us. In fact, most of the time we asked researchers to bring their data to us on external hard drives. This was not an acceptable solution in the 21st century!

Researchers like simple processes, Dropbox-like solutions, where one can easily drag and drop files. We have therefore created a simple webform, which asks researchers for the minimal necessary metadata information, and allows them to simply drag and drop their data files.

The outcomes

It turned in the end it was really worth the effort of understanding researchers’ needs before considering solutions. As of 24 August 2015, the Research Data Management website has been visited 10,992 times. Our training sessions on research data management and data planning have received extremely good feedback – 73% of respondents indicated that our workshops should be ‘essential’ to all PhD students.

And most importantly, since we launched our easy-to-upload website form for research data, we have received 122 research data submissions – in four months we have received more than 1.5 times more research outputs than in ten years of our repository’s lifetime.

So our advice to anyone wishing to really support researchers is to truly listen to their needs, and address their problems. If you create useful services, there is no need to worry about the uptake.

data-plasma4This infographic demonstrates how successful the Research Data Facility has been. Prepared by Laura Waldoch from the University Library, it is available for download.

To know more about our activities, follow us on Twitter.

 

Published 24 August 2015
Written by Dr Marta Teperek and Dr Danny Kingsley
Creative Commons License