All posts by Office of Scholarly Communication

Joint response on the draft UK Concordat on Open Research Data

During August the Research Councils UK on behalf of the UK Open Research Data Forum released a draft Concordat on Open Research Data for which they have sought feedback.

The Universities of Bristol, Cambridge, Manchester, Nottingham and Oxford prepared a joint response which was sent to the RCUK on 28 September 2015. The response is reproduced below in full.

The initial main focus of the Concordat should be good data management, instead of openness.

The purpose of the Concordat is not entirely clear. Merely issuing it is unlikely to ensure that data is made openly available. If Universities and Research Institutes are expected to publicly state their commitment to the Principles then they risk the dissatisfaction of their researchers if insufficient funds are available to support the data curation that is described. As discussed in the Comment #5 below, sharing research data in a manner that is useful and understandable requires putting research data management systems in place and having research data experts available from the beginning of the research process. Many researchers are only beginning to implement data management practices.It might be wiser to start with a Concordat on good data management before specifying expectations about open data. It would be preferable to first get to a point where researchers are comfortable with managing their data so that it is at least able to be citeable and discoverable. Once that is more common practice, then the openness of data can be expected as the default position.

The scope of the Concordat needs to be more carefully defined if it is to apply to all fields of research.

The Introduction states that the Concordat “applies to all fields of research” but it is not clear how the first sentence of the Introduction translates for researchers in the Arts and Humanities, (or in theoretical sciences, e.g. Mathematics). This sentence currently reads:

“Most researchers collect, measure, process and analyse data – in the form of sets of values of qualitative or quantitative variables – and use a wide range of hardware and software to assist them to do so as a core activity in the course of their research.”

The Arts and Humanities are mentioned in Principle #1, but this section also refers to benefits in terms of “progressing science”. We suggest that more input is sought specifically from academics in the Arts and Humanities, so that the wording throughout the Concordat is made more inclusive (or indeed exclusive, if appropriate).

The definition of research data in the Concordat needs to be relevant to all fields of research if the Concordat is to apply to all fields of research.

We suggest that the definition of data at the start of the document needs to be revised if it is to be inclusive of Arts and Humanities research (and theoretical sciences, e.g. Mathematics). The kinds of amendments that might be considered are indicated in italics:

Research Data can be defined as evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical forms). These might be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, interview or other methods, or information derived from existing evidence. Data may be raw or primary (e.g. direct from measurement or collection) or derived from primary data for subsequent analysis or interpretation (e.g. cleaned up or as an extract from a larger data set), or derived from existing sources where the copyright may be externally held. The purpose of open research data is not only to provide the information necessary to support or validate a research project’s observations, findings or outputs, but also to enable the societal and economic benefits of data reuse. Data may include, for example, statistics, collections of digital images, software, sound recordings, transcripts of interviews, survey data and fieldwork observations with appropriate annotations, an interpretation, an artwork, archives, found objects, published texts or a manuscript.

The Concordat should include a definition of open research data.

To enable consistent understanding across Concordat stakeholders, we suggest that the definition of research data at the start of the document be followed by a definition of “openness” in relation to the reuse of data and content.

To illustrate, consider referencing The Open Definition which includes the full Open Definition, and presents the most succinct formulation as:

“Open data and content can be freely used, modified, and shared by anyone for any purpose”.

The Concordat refers to a process at the end of the research lifecycle, when what actually needs to be addressed is the support processes required before that point to allow it to occur.

Principle #9 states that “Support for the development of appropriate data skills is recognised as a responsibility for all stakeholders”. This refers to the requirement to develop skills and provision of specialised researcher training. These skills are almost non-existent and training does not yet exist in any organised form (as noted by Jisc in March this year). There is some research data management training for librarians provided by the Digital Curation Centre (DCC) but little specific training for data scientists. The level of researcher support and training required across all disciplines to fulfil expectations outlined in Principle #9 will require a significant increase in both the infrastructure and staffing.

The implementation of, and integration between research data management systems (including systems external to institutions) is a complex process, and is an area of ongoing development across the UK research sector and will also take time for institutions to establish. This is reflected by the final paragraphs of DCC reports on the DCC RDM 2014 Survey and discussions around gathering researcher requirements for RDM infrastructure at the IDCC15 conference of March this year. It is also illustrated by a draft list of basic RDM infrastructure components developed through a Jisc Research Data Spring pilot.

The Concordat must acknowledge the distance between where the Higher Education research sector currently stands and the expectation laid out. While initial good progress towards data sharing and openness has been made in the UK, it will require further substantial culture change to enact the responsibilities laid out in Principle #1 of the Concordat, and this should be recognised within the document. There will be a significant time lag before staff are in place to support the research data management process through the lifecycle of research, so that the information is in a state that it can be shared at the end of the process.

We suggest that the introduction to the Concordat should include text to reflect this, such as:

“Sharing research data in a manner that is useful and understandable requires putting integrated research data management systems in place and having research data experts available from the beginning of the research process. There is currently a deficit of knowledge and skills in the area of research data management across the research sector in the UK. This Concordat is intended to establish a set of expectations of good practice with the goal of establishing open research data as the desired position over the long term. It is recognised that this Concordat describes processes and principles that will take time to establish within institutions.”

The Concordat should clarify more clearly its scope in relation to publicly funded research data and that funded from alternative sources or unfunded.

While the Introduction to the Concordat makes clear reference to publicly-funded research data, Principle #1 states that ‘it is the linking of data from a wide range of public and commercial bodies alongside the data generated by academic researchers’ that is beneficial. In addition, the ‘funders of research’ responsibilities should state whether these responsibilities relate only to public bodies, or wider (Principle #1).

The Concordat should propose sustainable solutions to fund the costs of the long-term preservation and curation of data, and how these costs can be borne by different bodies.

It is welcome that the Concordat states that costs should not fall disproportionately on a single part of the research community. However, currently the majority of costs are placed on the Higher Education Institutions (HEIs) which is not a sustainable position. There should be some clarification of how these costs could be met from elsewhere, for example research funders. In addition an acknowledgement that there will be a transition period where there may be little or no funding to support open data which will make it very difficult for HEIs to meet responsibilities in the short to medium term should be included. Furthermore, Principle #1 says that “Funders of Research will support open research data through the provision of appropriate resources as an acknowledged research cost.” It must be noted that several funders are at present reluctant or refusing to pay for the long-term preservation and curation of data.

The Concordat should propose solutions for paying for the cost of the long-term preservation and curation of data in cases where the ‘funders of research’ refuse to pay for this, or where research is unfunded. In the second paragraph of Principle #4 it is suggested that “…all parties should work together to identify the appropriate resource provider”. It would be useful to have some clarification about what the Working Group envisaged here. For example was it a shared national repository? Perhaps the RCUK (in collaboration with other UK funding bodies) could consider setting up a form of UK Data Service that meets the wider funding body audience for data of long-term value. This would also support the nature of collaboration and enable more re-use by increased data discoverability – data will not be stored at separate institutional repositories.

Additionally, there appears to be a contradiction between the statement in Principle 1 that “Funders of Research will support open research data through the provision of appropriate resources as an acknowledged research cost” and the statement in Principle #4: “…the capital costs for infrastructure may be incorporated into planned upgrades” which suggests that Universities or Research Institutes will need to fund infrastructure and services from capital and operational budgets.

The Concordat should clarify how an appropriate proportionality between costs and benefits might be assessed.

Principle #4 states that: “Such costs [of open research data] should be proportionate to real benefits.” This key relationship needs further amplification. How and at what stage can “real benefits” be determined in order to assess the proportionality of potential costs? The Concordat should state more clearly the ‘real and achievable’ benefits of open data with examples. What is the relationship between the costs and the benefits? Has this relationship been explored? The real benefits of sharing research data will only become clear over time. At the moment it is difficult to quantify the benefits without evidence from the open datasets. Moreover, there might be an amount of time after a project is finished before the real benefits are realised. Are public funders going to put in monetary support for such services?

Additionally, the Concordat should specify to what extent research data should be made easily re-usable by others. Currently Principle #3 mentions: “Open research data should also be prepared in such a manner that it is as widely useable as is reasonably possible…”. What is the definition of “reasonably possible”? Preparing data for use by others might be expensive, depending on the complexity of the data, and should be also taken into consideration when assessing the proportionality of potential costs of data sharing. Principle #4 states: “Both IT infrastructure costs and the on-going costs of training for researchers and for specialist staff, such as data curation experts, are expected to be significant over time.” These costs are indeed significant from the outset.

The Concordat (Principle #2) states: “A properly considered and appropriate research data management strategy should be in place before the research begins so that no data is lost or stored inappropriately. Wherever possible, project plans should specify whether, when and how data will be will be made openly available.” The Concordat should propose a process by which a proposal for data management and sharing in a particular research context is put forward for public funding. This proposal will need to include the cost-benefit-analysis for deciding which data to keep and distribute (and how best to keep and distribute it).

In general, the Concordat must balance open data requirements with allowing researchers enough time, and space to pursue innovation.

The Concordat should acknowledge the costs relating to undertaking regular reviews of progress towards open data.

Principle #4 refers to the following costs:

  • “necessary costs – for IT infrastructure and services, administrative and specialist support staff, and for researchers’ time – are significant”
  • “the additional and continuing revenue costs to sustain services – and rising volumes of data – for the long term are real and substantial”
  • “Both IT infrastructure costs and the on-going costs of training for researchers and for specialist staff, such as data curation experts, are expected to be significant over time”

However, there is no explicit reference to costs relating to Principle #10 regarding “Regular reviews of progress towards open access to research data should be undertaken”.

We suggest that Principle #4 should include text to reflect this, and the kind of amendment that might be considered is indicated in italics:

For research organisations such as universities or research institutes, these costs are likely to be a prime consideration in the early stages of the move to making research data open. Both IT infrastructure costs and the on-going costs of training for researchers and for specialist staff, such as data curation experts, are expected to be significant over time. Significant costs will also arise from Principle #10 regarding the undertaking of regular reviews of progress towards open access to research data.

The Concordat should explore the establishment of a central organisation to lead the transformation towards a cohesive UK research data environment.

Principle #3 states: “Data must be curated […] This can be achieved in a number of ways […] However, these methodologies may vary according to subject and disciplinary fields, types of data, and the circumstances of individual projects. Hence the exact choice of methodology should not be mandated”.

Realising the benefits of curation may have significant costs where curation extends over the long term, such as data relating to nuclear science which may need to be usable for at least 60 years. These benefits would be best achieved, and in a cost-effective manner, through the establishment of a central organisation that will lead the creation of a cohesive national collection of research resources and a richer data environment that will:

  • Make better use of the UK’s research outputs
  • Enable UK researchers to easily publish, discover, access and use data
  • Develop discipline-specific guidelines on data and metadata standards
  • Suggest discipline-specific curation and preservation policies
  • Develop protocols and processes for the access to restricted data
  • Enable new and more efficient research

In Australia this capacity is provided by the Australian National Data Service.

The Concordat should address the issues around sharing research data resulting from collaborations, especially international collaborations.

It has to be explicitly recognised that some researchers will be involved in international collaborations, with collaborators who are not publicly funded, or whose funders to do not require research data sharing. Procedures (and possible exemptions) for sharing of research data in such circumstances should be discussed in the Concordat.

Additionally, the Concordat should suggest a sector-wide approach when considering the costs and complexities of research involving multiple institutions. Currently where multiple institutions are producing research data for one project there is a danger that it is deposited in multiple repositories which is neither pragmatic nor cost-effective.

Non-public funders need to be consulted about sharing of commercially-sponsored data, and the Concordat should acknowledge the possibility of restricting the access to research data resulting from commercial collaborations.

Since the Concordat makes recommendations with regards to making commercially-sponsored data accessible, significant conversation with non-public funders are needed. Otherwise, there is a risk that the expectations on industry are unlikely to be met. The current wording could damage industrial appetite to fund academic research if they are pushed towards openness without major consultation.

We also suggest that in the second paragraph of Principle #5, the sentence: “There is therefore a need to develop protocols on when and how data that may be commercially sensitive should be made openly accessible, taking account of the weight and nature of contributions to the funding of collaborative research projects, and providing an appropriate balance between openness and commercial incentives.” is changed to “There is therefore a need to develop protocols on whether, when and how data that may be commercially sensitive should be made openly accessible, taking account of the weight and nature of contributions to the funding of collaborative research projects, and providing an appropriate balance between openness and commercial incentives.” The Concordat should also recognise that development and execution of these processes is an additional burden on institutional administrative staff which must not be underestimated.

The Concordat should more generally recognise the increasing economic value of data produced by researchers.

Where commercial benefits can be quantified (such as the return on investment of a research project) this should be recognised as a reason to embargo access to data until such things as patents can be successfully applied. University bodies charged with the commercialization of research should be entitled to assess the potential value of research before consenting to data openness.

The Concordat should allow the use of embargo periods to allow release of data to be delayed up to a certain time after publication, where this is appropriate and justifiable.

The Concordat expects research data underpinning publications to be made accessible by the publication date (Principles #6 and #8). This does not, however, take into account disciplinary norms, where sometimes access to research data is delayed until a specified time after publication. For example, in crystallography (Protein Data Bank) the community has agreed a maximum 12-month delay between publishing the first paper on a structure and making coordinates public for secondary use. Delays in making data accessible are accepted by funders. For example, the BBSRC allows exemptions for disciplinary norms, and where best practices do not exist BBSRC suggests release within three years of generation of the dataset; the STFC expects research data from which the scientific conclusions of a publication are derived to be made available within six months of the date of the relevant publication. Research data should be discoverable at the time of publication, but it may be justifiable to delay access to the data.

The Concordat should make mention of the difficulties involved with ethical issues of data sharing, including issues around data licensing, and data use by others.

Ethical issues surrounding release and use of research data are briefly mentioned in Principle #5 and Principle #7. We believe the Concordat could benefit from expansion on the ethical issues surrounding release and use of research data, and advice on how these can be addressed in data sharing agreements. This is a large and complex area that would benefit from a national framework of best practice guidelines and methods of monitoring.

Furthermore, the Concordat does not provide any recommendations about research data licensing. This should be discussed together with issues about associated expertise required, costs and time. It is mentioned briefly above in point 4.

The Concordat’s stated expectations regarding the use of non-proprietary formats should be realistic.

Principle #3 states that:

“Open research data should also be prepared in such a manner that it is as widely useable as is reasonably possible, at least for specialists in the same or linked fields, wherever they are in the world. Any requirement to use specialised software or obscure data manipulations should be avoided wherever possible. Data should be stored in non-proprietary formats wherever possible, or the most commonly used proprietary formats if no equivalent non-proprietary format exists.”

The last two sentences of this paragraph could be regarded as unreasonable, depending on the definition of what is ‘possible’. It might theoretically be possible to convert data for storage but not remotely cost-effective. Other formulations (e.g. from EPSRC) talk about the burden of retrieval from stored formats being on the requester not the originator of the data.

We suggest that this section should be rephrased in-line with EPSRC recommendations, for example:

“Wherever possible, researchers are encouraged to store research data in non-proprietary formats. If this is not possible (or not cost-efficient), researchers should indicate what proprietary software is needed to process research data. Those requesting access to data are responsible for re-formatting it to suit their own research needs and for obtaining access to proprietary third party software that may be necessary to process the data.”

The Concordat should encourage proper management of physical samples and non-digital research data.

The Concordat should also encourage proper management of physical samples, and other forms of non-digital research data. Physical samples such as fossils, core samples, zoological and botanical samples, and non-digital research data such as recordings, papers notes, etc. should be also properly managed. In some areas the management and sharing of these items is well constructed and understood – for example, palaeontology journals will not allow people to publish without the specimen numbers from a museum – but it is less rigid in other areas of research. It would be desirable if the Concordat would encourage development of discipline-specific guidelines for management of physical samples and other non-digital research data.

Principle #5 must recognise the culture change required to remove the decision to share data from an individual researcher.

Principle #5 states that:

“‘Decisions on withholding data should not generally be made by individual researchers but rather through a verifiable and transparent process at an appropriate institutional level.”

Whilst the reasoning behind this Principle is understandable, it must recognise that we are not yet in a mature culture of data sharing and a statement removing data sharing decisions from the researcher will need changes in workflows and more importantly culture and autonomy of the researchers.

The idea that open research data should be formally acknowledged as a legitimate output of the research should form a separate principle.

The last paragraph of Principle #6 states that open research data should be acknowledged as a legitimate output of the research and that it “…should be accorded the same importance in the scholarly record as citations of other research objects, such as publications”. We strongly support this idea but recognise that this is a fundamental shift in working practices and policies. We are probably still several years off from seeing formal citation of datasets as an embedded practice for researchers and the development of products/services around the resulting metrics. This point is completely separate from the rest of Principle #6 and should form a principle in its own right.

Principle #2 must recognise that it may take significant resource for institutions to provide the infrastructure required for good data management.

While the focus of this Principle on good data management through the lifecycle, rather than the focus on open data sharing, is welcome, there are significant human, technical and sociotechnical developments required to meet this requirement; and also resources, in terms of people, time and infrastructure, that will be needed to shift to a mature position. These needs should be recognised in the Concordat.

The Concordat should clarify the reference to “other workers” in Principle #7

We would value some clarification on paragraph 3 of Principle #7 in relation to the reference to “other workers”: “Research organisations bear the primary responsibility for enforcing ethical guidelines and it is therefore vital that such guidelines are amended as necessary to make clear the obligations that are inherent in the use of data gathered by other workers.”

What is ‘research impact’ in an interconnected world?

Perhaps we should start this discussion with a definition of ‘impact’. The term impact is used by many different groups for different purposes, and much to the chagrin of many researchers it is increasingly a factor in the Higher Education Funding Councils for England’s (HECFE) Research Excellence Framework. HEFCE defined impact as:

‘an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia’.

So we are talking about research that affects change beyond the ivory tower. What follows is a discussion about strengthening the chances of increasing the impact of research.

Is publishing communicating research?

Publishing a paper is not a good way of communicating work. There is some evidence that much published work is not read by anyone other than the reviewers. During an investigation of claims that huge numbers of papers were never cited, Dahlia Remler found that:

  • Medicine  – 12% of articles are not cited
  • Humanities – 82% of articles are not cited – note however that their prestigious research is published in books, however many books are rarely cited too.
  • Natural Sciences – 27% of articles are never cited
  • Social Sciences – 32% of articles are never cited

Hirsch’s 2005 paper: An index to quantify and individual’s scientific research output, proposing the h index – defined as the number of papers with citation number ≥h. So an h index of 5 means the author has at least 5 papers with at least 5 citations. Hirsch suggested this as a way to characterise the scientific output of researchers. He noted that after 20 years of scientific activity, an h index of 20 is a ‘successful scientist’. When you think about it, 20 other researchers are not that many people who found the work useful. And that ignores those people who are not ‘successful’ scientists who are, regardless, continuing to publish.

Making the work open access is not necessarily enough

Open access is the term used for making the contents of research papers publicly available – either by publishing them in an open access journal or by placing a copy of the work in a subject or institutional repository. There is more information about open access here.

I am a passionate supporter of open access. It breaks down cost barriers to people around the world, allowing a much greater exposure of publicly funded research. There is also considerable evidence showing that making work open access increases citations.

But is making the work open access enough? Is a 9.5MB pdf downloadable onto a telephone, or through a dail-up connection?  If the download fails at 90% you get nothing. Some publishing endeavours have recogised this as an issue, such as the Journal of Humanitarian Engineering (JHE), which won the Australian Open Access Support Group‘s 2013 Open Access Champion award for their approach to accessibility.

Language issues

The primary issue, however is the problem of understandability. Scientific and academic papers have become increasingly impenetrable as time has progressed. It’s hard to believe now that at the turn of last century scientific articles had the same readability as the New York Times.

‘This bad writing is highly educated’ is a killer sentence from Michael Billig’s well researched and written book ‘Learn to Write Badly: How to Succeed in the Social Sciences‘.  This phenomenon is not restricted to the social sciences, specialisation and a need to pull together with other members of one’s ‘tribe‘ mean that academics increasingly write in jargon and specialised language that bears little resemblance to the vernacular.

There are increasing arguments for scientific communication to the public being part of formal training. In a previous role I was involved in such a program through the Australian National Centre for the Public Awareness of Science. Certainly the opportunities for PhD students to share their work more openly have never been more plentiful. There are many three minute thesis competitions around the world. Earlier this year the British Library held a #Share your thesis competition where entrants were first asked to tweet why their PhD research is/was important using the hashtag #ShareMyThesis. The eight shortlisted entrants were asked to write a short article (up to 600 words) elaborating on their tweet and explaining why their PhD research is/was important  in an engaging and jargon-free way.

Explaining work in understandable language is not ‘dumbing it down’.  It is simply translating it into a different language. And students are not restricted to the written word. In November the eighth winner of the annual ‘Dance your PhD‘ competition sponsored by Science, Highwire Press and the AAAS will be announced.

Other benefits

There is a flow-on effect from communicating research in understandable language. In September, the Times Higher Education recently published an article ‘Top tips for finding a permanent academic job‘ where the information can be summarised as ‘communicate more’.

The Thinkable.org group’s aim is to widen the reach and impact of research projects using short videos (three minutes or less). The goal of the video is to engage the research with a wide audience. The Thinkable Open Innovation Award is a research grant that is open to all researchers in any field around the world and awarded openly by allowing Thinkable researchers and members to vote on their favourite idea. The winner of the award receives $5000 to help fund their research. This is specifically the antithesis of the usual research grant process where grants “are either restricted by geography or field, and selected via hidden panels behind closed doors”.

But the benefit is more than the prize money. This entry from a young Uni of Manchester PhD biomedical student did not win, but thousands of people engaged in her work in just few weeks of voting.

Right. Got the message. So what do I need to do?

Researcher Mike Taylor pulled together a list of 20 things a researcher needs to do when they publish a paper.  On top of putting a copy of the paper in an institutional or subject repository, suggestions include using various general social media platforms such as Twitter and blogs, and also uploading to academic platforms.

The 101 Innovations in Scholarly Communication research project run from the University of Utrecht is attempting to determine scholarly use of  communication tools. They are analysing the different tools that researchers are using through the different phases of the research lifecycle – Discovery, Analysis, Writing, Publication, Outreach and Assessment through a worldwide survey of researchers. Cambridge scholars can use a dedicated link to the survey.

There are a plethora of scholarly peer networks which all work in slightly different ways and have slightly different foci.  You can display your research into your Google Scholar or CarbonMade profile. You can collate the research you are finding into Mendeley or Zotero. You can also create an environment for academic discourse or job searching with Academia.edu, ResearchGate and LinkedIn. Other systems include Publons – a tool to register peer reviewing activity.

Publishing platforms include blogging (as evidenced here), Slideshare, Twitter, figshare, Buzzfeed. Remember, this is not about broadcasting. Successful communicators interact.

Managing an online presence

Kelli Marshall from DePaul University asks ‘how might academics—particularly those without tenure, published books, or established freelance gigs—avoid having their digital identities taken over by the negative or the uncharacteristic?’

She notes that as an academic or would-be academic, you need to take control of your public persona and then take steps to build and maintain it. If you do not have a clear online presence, you are allowing Google, Yahoo, and Bing to create your identity for you. There is a risk that the strongest ‘voices’ will be ones from websites such as Rate My Professors.

Digital footprint

Many researchers belong to an institution,  a discipline and a profession. If these change your online identity associated with them will also change. What is your long term strategy? One thing to consider is obtaining a persistent unique identifier such as an ORCID – which is linked to you and not your institution.

When you leave an institution, you not only lose access to the subscriptions the library has paid for, you also lose your email address. This can be a serious challenge when your online presence in academic social media sites like Academia.edu and ResearchGate are linked to that email address. What about content in a specific institutional repository? Brian Kelly discussed these issues at a recent conference.

We seem to have drifted a long way from impact?

The thing is that if it can be measured it will be. And digital activity is fairly easily measured. There are systems in place now to look at this kind of activity. Altmetrics.org moves beyond the traditional academic internal measures of peer review, Journal Impact Factor (JIF) and the H-index. There are many issues with the JIF, not least that it measures the vessel, not the contents. For these reasons there are now arguments such as the San Francisco Declaration on Research Assessment (DORA) which calls for the scrapping of the JIF to assess a researcher’s performance. Altmetrics.org measures the article itself, not where it is published. And it measures the activity of the articles beyond academic borders. To where the impact is occurring.

So if you are serious about being a successful academic who wants to have high impact, managing your online presence is indeed a necessary ongoing commitment.

NOTE: On 26 September, Dr Danny Kingsley spoke on this topic to the Cambridge University Alumni festival. The slides are available in Slideshare. The Twitter discussion is here.

Published 25 September 2015
Written by Dr Danny Kingsley
Creative Commons License

It’s time for open access to leave the fringe

The Repository Fringe was held in Edinburgh on 3-4 August. With the theme of “Integrating repositories in the wider context of university, funder and external services”, the event brought together repository managers across the UK to discuss practice and policy. Dr Arthur Smith, Open Access Research Advisor at the University of Cambridge, attended the event and came away with the impression that more needs to be done to embed open access in scholarly processes.

In his keynote speech to Repository Fringe 2015, titled ‘Fulfilling their potential: is it time for institutional repositories to take centre stage?’  David Prosser, Executive Director of Research Libraries UK (RLUK) gave a concise overview of the history surrounding open access and the situation we currently find ourselves in, especially in the UK.

What’s become clear is that ‘we’ is a problematic term for the scholarly communications community. A lack of cohesion and vision between librarians, repository managers and administrators means ‘we’ have failed to engage with researchers to make the case for open access.

I feel this is due to, in part, the fragmented nature of repositories stemming from an institutional need for control. If national (and international) open access subject repositories had been created and exploited perhaps researcher uptake of open access in the UK and around the world would have been faster. For example, arXiv continues to be the one stop shop for physicists to publish their manuscripts precisely because it’s the repository for the entire physics community. That’s where you go if you’ve got a physics paper. To be fair, physics had a culture of sharing research papers that predates the internet.

Repositories are only as good as the content they hold, and without support from the academic community to fill repositories with content, there is a risk of side-lining green open access*. This will in turn increase the pressure to justify the cost of ineffective institutional repositories.

As David correctly identified, scholars will happily take the time to do things they feel are important. But for many researchers open access remains a low priority and something not worth investing their time in. Repositories are only capturing a fraction of their institution’s total publication output. At Cambridge we estimate that only 25-30% of articles are regularly deposited.

Providing value

The value of open access, whether it’s green or gold**, isn’t obvious to the authors producing the content. Yet juxtaposed with this is a report prepared by Nature Publishing Group on 13 August: Perceptions of open access publishing are changing for the better. This examined the changing perceptions of researchers to open access. While many researchers are still unaware of their funders’ open access requirements, the general perception of open access journals in the sciences has changed significantly, from 40% who were concerned about the quality of OA publication in 2014, to just 27% in 2015.

Clearly the trend is towards greater acceptance of open access within the academic community, but actual engagement remains low. If we don’t want to end up in a world of expensive gold open access journals, green repositories must be competitive with slick journal websites. Appearances matter. We need to attract the attention of the academics so that open access repositories are seen as viable places for disseminating research.

The scholarly communications community must find new ways of making open access (particularly green open access) appealing to researchers. One way forward is to augment the reward structure in academic publishing. Until open access is adopted more widely, academics should be rewarded for the effort involved in making their work openly available.

In the UK, failure to comply with the Higher Education Funding Council for England (HEFCE) and other funders’ policies could seriously affect future funding outcomes. It is the ever-present threat of funding cuts which drives authors to choose open access options, but this has changed open access into a policy compliance debacle.

Open access as a side effect of policy compliance is not enough; we need real support from academics to propel open access forward.

Measuring openness

As a researcher, the main things I look for when assessing other researchers and their publications are h-index, total and article level citations, and journal prestige (impact factor). I am not aware of any other methods which so simply define an author’s research.

While these types of metrics have their problems, they are nonetheless widely used within the academic community. An annual openness index, which is simply the ratio of open access articles to the total number of publications, would quickly reveal how open an academic’s research publications are. This index could be applied equally to established professors and early career researchers, as unlike the h-index, there is no historical weighting. It only depends on how you’re publishing now.

Developing such a metric would spur on open access from within academic circles by making open access publishing a competition between researchers. Perhaps the openness index could also be linked to university progression and grant reward processes. The more open access your work is, the better it is for you, and as a consequence, the community.

Open access needs to stop being a ‘fringe’ activity and become part of the mainstream. It shouldn’t be an afterthought to the publication process. Whether the solution to academic inaction is better systems or, as I believe, greater engagement and reward, I feel that the scholarly communications and repository community can look forward to many interesting developments over the coming months and years.

However, we must not be distracted from our main goal of engaging with researchers and academics to gather content for the open access repositories we have so lovingly built.

Glossary

*Green open access refers to making a copy of a published work available by placing it in a repository. This can be thought of as ‘secondary’ open access.

**Gold open access is where the research is published either in a fully open access journal – which sometimes incurs an article processing charge, or in a hybrid journal – which imposes an article processing charge to make that particular article available and also charges a subscription for the remainder of the articles in the journal. This can be thought of as ‘born’ open access.

Published 27 August 2015
Written by Dr Arthur Smith
Creative Commons License