Monthly Archives: March 2016

Consider yourself disrupted – notes from RLUK2016

The 2016 Research Libraries UK conference was held at the British Library from 9-11 March on the theme of disruptive innovation. This blog pulls out some of the highlights personally gained from the conference:

  • If librarians are to be considered important – we as a community need to be strong in our grasp of understanding scholarly communication issues
  • We need to know the facts about our subscriptions to, usage of and contributions to scholarly publishing
  • We need high level support in institutions to back libraries in advocacy and negotiation with publishers
  • Scientists are rarely rewarded for being right, so the scientific record is being distorted by the scientific ecosystem
  • Society needs more open research to ensure reproducibility and robust research
  • The library of the future will have to be exponentially more customisable than the current offering
  • The information seeking behaviour of researchers is iterative and messy and does not match library search services
  • Libraries need to ‘create change to triumph’ – to be inventors rather than imitators
  • Management of open access issues need to be shared across institutions with positive outcomes when research offices and libraries collaborate.

I should note this is not a comprehensive overview of the conference, and I have blogged separately about my own contribution ‘The value of embracing unknown unknowns’. Some talks were looking at the broader picture, others specifically at library practice.

Stand your ground – tips for successful publisher negotiations

The opening keynote presentation was by Professor Gerard Meijer, President of Radboud University who conducted the recent Dutch negotiations with Elsevier.

The Dutch position has been articulated by Sander Dekker, the State Secretary  of Education who said while the way forward was gold Open Access, the government would not provide any extra money. Meijer noted this was sensible because every extra cent going into the system goes into the pocket of publishers – something that has been amply demonstrated in the UK.

All universities in the Netherlands are in top 200 universities in the world. This means all research is good quality – so even if it is only 2% of the world output, the Netherlands has some clout.

Meijer gave some salient advice about these types of negotiations. He said this work needs to be undertaken at the highest level at the universities. There are several reasons for this. He noted that 1.5 to 2 percent of university budget goes to subscriptions – and this is growing as budgets are being cut – so senior leadership in institutions should take an active position.

In addition if you are not willing to completely opt out of licencing their material then you can’t negotiate, and if you are going to opt out you will need the support of the researchers. To that end communication is crucial – during their negotiations, they would send a regular newsletter to researchers letting them know how things were going.

Meijer also stressed the importance of knowing the facts, and the need to communicate and inform the researchers about these facts and the numbers. He noted that most researchers don’t know how much subscriptions cost. They do know however about article processing charges – creating a misconception that Open Access is more expensive.

Institutions in the Netherlands spent €9.2 billion million on Elsevier publications in 2009, which rose to €11billion million* in 2014. Meijer noted that he was ‘not allowed’ to tell us this information due to confidentiality clauses. He drolly observed “It will be an interesting court case to be sued for telling the taxpayers how their money is being spent”. He also noted that because Elsevier is a public company their finances are available, and while their revenue goes up, their costs stay the same.

Apparently Wiley and Springer are willing to go into agreements. However Elsevier are saying that a global business model doesn’t match with a local business requirement. The Netherlands  has not yet signed the contract with Elsevier as they are working out the detail.

Broadly the deal is for three years, from 2016 to 2018. The plan is to grow the Open Access output from nothing to 10% in 2016, 20% in 2017, 30% in 2018 and want to do that without having to pay APCs. To achieve this they have to identify journals that we make Open Access , by defining domains where all journals in these domains we make open access.

Meijer concluded this was a big struggle – he would have liked to have seen more – but what we have is good for science. Dutch research will be open in fields where most Open Access is happening and researchers are paying APCs. Researchers can look at the long list of journals that are OA and then publish there.

*CORRECTION: Apologies for my mistyping.  Thanks to    @WvSchaik for pointing out this error on Twitter. The slide is captured in this tweet.

The future of the research library

Nancy Fried Foster from Ithaka S+R and Kornelia Tancheva from Cornell University Library spoke about research practices and the disruption of the research library. They started by noting that researchers work differently now, using different tools. The objective of their ‘A day in the life of a serious researcher’ work was exploring research practices to inform the vision of library of the future and identify improvements we could make now.

They developed a very fine-grained method of seeing what people do which focuses on what people really do in the workplace. This used a participatory design approach. Participants (who were mainly post graduates) were asked to map or log their movements in one single day where at least some of their time was engaged in research. The team then sat with the person the following day to ask them to narrate their day – and talk about seeking, finding and using information. There was no distinction between academic and non-academic activity.

The team looked at the things that people were doing and the things that the library could and will be. The analysis took a lot of time, organising into several big categories:

  • Seeking information
  • Academic activities
  • Library resources
  • Space, self management and
  • Circum-academic activities – activities allied to the researchers academic line but not central.

They also coded for ‘obstacles’ and ‘brainwork’.

The participants described their information seeking as fluid and constant – ‘you can just assume I am kind of checking my email all the time’. They also distinguished between search and research. One quote was ‘I know the library science is very systematic and organised and human behaviour is not like that’.

Information seeking is an iterative process, it is constant and not systematic. The search process is highly idiosyncratic – our subjects have developed ways of searching for information that worked for them. It doesn’t matter if it is efficient or not. They are self conscious that it is messy. ‘I feel like the librarians must be like “this is the worst thing I have ever heard”’.

Information evaluation is multi-tiered – eg: ‘If an article is talking about people I have heard of it is worth reading’. Researchers often use a mash up of systems that will work for that project. For example email is used as an information management tool.

Connectivity is important to researchers, it means you can work anywhere and switch rapidly between tasks. It has a big impact on collaboration – working with others was continuously mentioned in the context of writing. However sometimes researchers need to eliminate technology to focus.

Libraries have traditionally focused too much on search and not enough on brain work – this is a potential role for libraries. References to the library occurred throughout the process. Libraries are often thought of as a place for refuge – especially for the much needed brain work. The need for self management – enable them to manage their time prioritise the demands on their attention. Strategies depended on a complicated relationship with technology.

One of the major themes emerging from the work is search is idiosyncratic and not important, research has no closure, experts rule and research is collaboration. The implications for the future library are that the future library is a hub, not just focusing on a discovery system but connecting people with knowledge and technologies.

If we were building a library from scratch today what would it look like? There will need to be a huge amount of customisation to adjust tools to suit researchers personal preferences. The library of the future will have to be exponentially more customisable than the current offering. Libraries will have to make available their resources on customisable platforms. We need to shift from non-interoperable tools to customisation.

So if the future were here today we would think of future library – an academic hub (improving current library services) and an application store. We should take on even more of a social media aspect. Think of a virtual ‘app store’ – on an open source platform that provides the option for people to suggest short cuts – employ developers to develop these modules quickly. Take a leadership role in ensuring vendor platforms can be integrated. All library resources will speak easily to the systems our users are using. We need to provide individualised services rather than one size fits all.

Scientific Ecosystems and Research Reproducibility

The scientific reward structure determines the behaviour of researchers and that this has spawned the reproducibility crisis according to Marcus Munafo from the University of Bristol.

Marcus started by talking about the P value where the statistically significant value is 95% – that is, the chance of the hypothesis being wrong is less than five in 100. Generally, studies need to cross this threshold to get published, so there is evidence to show that original studies often suggest a large effect – however when attempted, these effects are not able to be replicated.

Scientists are supposed to be impartial observers, but in reality they need to get grants, and publish papers to get promoted to more ‘glamorous institutions’ (Marcus’ words). Scientists are rarely rewarded for being right, so the scientific record is being distorted by the scientific ecosystem.

Marcus noted it is common to overstate your data or error check your data if your first analysis doesn’t tell you what you are looking for. This ‘flexible analysis’ is quite commonplace, if we look at literature as a whole. Often there is not enough detail in the paper to allow the reproducibility of the work. There are nearly as many unique analysis pipelines as there were studies in the sample – so this flexibility in the joint analysis tool gets leveraged to get the result you want.

There is also evidence that journal impact factors are a very poor indicator of quality, indeed it is a stronger indicator of retraction than quality. The idea is that the whole science will self correct. But science won’t sort itself out in a reasonable timeframe. If you look at the literature you see that replication is the exception rather than the norm.

One study showed among 83 articles recommending effective interventions, 40 had not been replicated, and of those that had been replicated many showed the works had stronger findings in the first paper than in the replication, and some were contradicted in the replication.

Your personal investment in the field shapes your position – unconscious biases that affects all of us. If you come in as an early career scientist you get an impression that the field is more robust than it is in reality. There is hidden literature that is not citable – only by looking at this you have a balanced sense of how robust the literature is. There are many studies that make a claim in the abstract that is not supported by more impartial reading. Others are ‘optimistic’ in the abstract. The articles that describe bad news receive far fewer citations than would be expected. People don’t want to cite bad news. So is science self correcting?

We can introduce measures to help science self correct. In 2000 the requirement to register the outcome of clinical trials began. Once they had to pre-specify what the outcome would be then most of the findings were null. That is why it is a scientific ecosystem – the way we are incentivised has become distorted over the years.

Researchers are incentivised to produce a small number of papers that are eye catching.  It is understandable why you would want to focus on quality over quantity. We can give more weight to confirmatory studies and try to move away from the focus on publishing in certain types of studies. We shouldn’t be putting all our effort into high risk, high return.

What do we do about this? There can be top down measures, but individual groups can work in ways to improve the ways we work, such as adopting the open science way of working. This is not trivial – for example we can’t make data available without the consent of participants. Possible solutions include pre-registering all the plans, set up studies so the data can be made open, ensure publications are gold OA. These measures serve as a quality control method because everything gets checked because people know it is going to be made available. We come down hard on academics who make conscious mistakes – but we should be encouraging people to identify their own errors.

We need to introduce quality control methods implicitly into our daily practice. Open data is a very good step in that direction. There is evidence that researchers who know their data is going to be made open are more thorough in their checking of it. Maybe it is time for an update in the way we do science – we have statistical software that can run hundreds of analysis, and we can do text and data mining of lots of papers. We need to build in new processes and systems that refine science and think about new ways of rewarding science.

Marcus noted that these are not new problems, quoting from Reflections on the Decline of Science in England written by Babbage in 1830.

Marcus referred to many different studies and articles in his talk, some of which I have linked out to here:

Creating change to triumph: A view from Australia

The idea of creating change to triumph was the message of Jill Benn, the Librarian at the University of Western Australia. She discussed Cambietics, the science of managing change. This was a theory developed in 1985 by Barrett, with three stages:

  • Coping with change to survive
  • Capitalising on change
  • Creating change to triumph.

This last is the true challenge – to be an inventor rather than an imitator. Jill gave the Australian context. The country is 32 times bigger than UK, but has a third of the population, with 40 universities around the country. She noted that one of the reasons libraries in Australia have collaborated is the isolation.

Research from Australia counts for 4% of the world’s research output, it is the third largest export after energy, and out-performs tourism. The political landscape really affects higher education. There has been a series of five prime ministers in five years.

Australia has invested heavily in research infrastructure – mostly telescopes and boats. The Australian National Data Service was created and this has built the Research Data Australia interface – an amazing system full of data. The libraries have worked with researchers to populate the repository. There has been a large amount of capacity building. ANDS worked with libraries to build the capacities – the ’23 things’ training programme. You self register – on 1 March, 840 people had signed up for the programme.

The most recent element of the government’s agenda has been innovation. Prime Minister Turnbull has said he wanted to end the ‘publish or perish’ culture of research to increase the impact on community. There is a national innovation and science agenda and the government would not longer take into account publications for research. It is likely the next ERA (Australia’s equivalent of the REF) will involve impact in the community. The latest call is “innovation is the new black”.

There is financial pressure on the University sector – which pays in US dollars which is a problem. The emphasis on efficiency means the libraries have to show value and impact to the research sector.

Many well-developed services exist in university libraries to support research. Australian institutional repositories now have over 650K full text items, which are downloaded over 1 million times annually, there are data librarians and scholarly communication librarians. Some of the ways in which libraries have been asked to deliver capacity is CAUL and its Research Advisory Committee – to engage in the government’s agenda. There are three pillars – capacity building, engagement and advocacy, to promote the work of our libraries to bodies like Universities Australia.

Jill also mentioned the Australasian Open Access Strategy Group which has had a green rather than a gold approach. Australians are interested in open access. It is not yet clear what our role will be of institutional repositories into the future. In an environment where the government wants us to share our research.

How can we benchmark the Australian context? It is difficult. Look at our associations and about what data we might be able to share. Quote from Ross Wilkinson – yes there are individuals but the collective way Australia has managed data we are better able to engage internationally. Despite the investment into repositories in Australia – the UK outperforms Australia.

Australian libraries see themselves as genuine partners for research and we have a healthy self confidence (!). Libraries must demonstrate value and impact and provide leadership. Australian libraries have created change to triumph.

Open access mega-journals and the future of scholarly communication

This talk was given by Professor Stephen Pinfield from Sheffield University. He talked about the Open Access Mega Journal project he is working on with potentially disruptive open access journals (the Twitter handle is @oamj_project).

He began where it all began – with PLOS ONE, which is now the biggest journal in the world. Stephen noted that mega journals are full of controversy, listing comments ranging from them being the future of academic publishing, a disruptive innovation to the best possible future system.

However critics see them variously as a dumping ground, career suicide for early career researchers publishing in them and a cynical money making venture. However, Pinfield noted that despite considerable searching acknowledging what ‘people say’ is different from being able to provide attributed negative statements about mega-journals.

The open access and wide scope nature of mega-journals reverses the trend over past few years where journals have been further specialising, They are identifiable by their approach to quality control, with an emphasis on scientific soundness only rather than subjective assessments of novelty and also by their post publication metrics.

Pinfield noted that there are economies of scale for mega journals – this means that we have single set of processes and technologies. This enables a tiered scholarly publishing system. Mega-journals potentially allow highly selective journals to go open access (who often argue that they reject so much they couldn’t afford to go OA). Pinfield hypothesised that a business model could be where a layer of highly selective titles sits above a layer of moderately selective mega journals. The moderately selective journals provide the financial subsidy but the highly selective ones provide the reputational subsidy. PLOS is a good example of this symbiotic relationship.

The emphasis on ‘soundness’ in the quality control process reduces the subjectivity of judgements of novelty and importance and potentially shifts the role and the power of the gatekeepers. Traditionally the editors and editorial board members have been the arbiters of what is novel.

However this opens up some questions. If it is only a ‘soundness’ judgement then the question is whether power is shifted for good or ill? Also does the idea of ‘soundness’ translate to the Humanities? There is also the problem of an overreliance on metrics. Are the citation values of journals driven by the credibility or the visibility of the journals?

Pinfield emphasised the need for librarians to be informed and credible about their understanding of these topics. If librarians are to be considered important – we as a community need to be strong in our grasp of understanding these issues. There is an ongoing need to keep up to date and remain credible.

Working together to encourage researcher engagement and support

There were several talks about how institutions have been engaging researchers, and many of them emphasised the need to federate the workload across the institution. Chris Aware from the University of Hull discussed some work he was doing with Valerie McCutcheon on the current interaction between library and other parts of the institution in supporting OA, understand how OA is and could be embedded.

The survey revealed a desire for the management of Open Access to be more spread across the institution into the future. Libraries should be more involved in the management of the research information system and managing the REF. However Library involvement in getting Open Access into grant applications is lower – this is a research role, but it is worth asking how much this underpins subsequent activity.

As an aside Chris noted a way of demonstrating the value of something is to call it an ‘office’ – this is something the Americans do. (Indeed it is something Cambridge has done with the Office of Scholarly Communication).

Chris noted that if researchers don’t think about open access as part of the scholarly communications workflow then they won’t do it. Libraries play a key role in advocating and managing OA – so how can they work with other institutional stakeholders in supporting research?

Valerie later spoke about blurring and blending the borders between the Library and the Research Office. She noted that when she was working for Research and Enterprise (RSEO) she thought library people were nice, but she was not sure what the people do there. When she transferred to working in the Library, the perception back the other way was the same.

But the Research Office and the Library need to cooperate on shared strategic priorities. They are both looking out for changes in policy landscape they need to share information and collaborate on policy development and dissemination. They need better data quality in the research process to find solutions to create agile systems to support researchers.

At Glasgow the Library & RSEO were a good match because they had similar end uses and the same data. So this began a close collaboration between the two offices which worked together on the REF, used Enlighten. They also linked their systems (Enlighten and Research Systems) in 2010 where users can browse in the repository by the funder name. Glasgow has had a publications policy rather than an open access policy since 2008.

Valerie also noted that it was crucial to have high-level support and showed a video of Glasgow’s PVC-R singing the praises of the work the Library was doing.

The Glasgow Open Access model has been ‘Act on acceptance’ since 2013 – a simple message with minimal bureaucracy. A centralised service with ‘no fancy meetings’. Valerie also noted that when they put events on they don’t say it is a Library event, the sessions subject based not department based.

Torsten Reimer and Ruth Harrison discussed the support offered at Imperial College, where Torsten said he was originally employed for developing the College’s OA mandate but then the RCUK and the HEFCE policy came into place and changed everything. At Imperial, scholarly communications is seen as an overall concern for the College rather than specifically a Library issue.

Torsten noted the Library already had a good relationship with the departments. The Research Office is seen by researchers as a distraction from their research, but the Library is seen as helping research. However because the two areas have been able to approach everything with one single aim, this has allowed open access and scholarly support to happen across the institution and allowed the library to expand.

Imperial have one workflow and one system for open access which is all managed through Symplectic (there had been separate systems before). They have a simple workflow and form to fill in, then have a ticketing type customer workflow system plugged into Symplectic to pull information out at the back end. This system has replaced four workflows, lots of spreadsheets and much cut and pasting.

Sally Rumsey talked about how Oxford have successfully managed to engage their research community with their recently launched ‘Act on Acceptance’ communication programme.

Summary

This is a rundown of a few of the presentations that spoke to me. There were also excellent speed presentations, Lord David Willetts, the former Minister for Universities and Science spoke, we split up into workshops and there was a panel of library organisations around the world who discussed working together.

The personal outcomes from the conference include:

  • An invitation to give a talk at Cornell University
  • An invitation to collaborate with some people at CILIP about ensuring scholarly communication is included in some of the training offered
  • Discussion about forming some kind of learned society for Scholarly Communication
  • Discussion about setting up a couple of webinars – ‘how to start up an office of scholarly communication’ and ‘successful library training programmes’
  • Also lots of ideas about what to do next – the issue of language and the challenges we are facing in scholarly communication because of language deserves some investigation.

I look forward to next year.

Published 14 March 2016
Written by Dr Danny Kingsley
Creative Commons License

 

The value of embracing unknown unknowns

This blog accompanies a talk Danny Kingsley gave to the RLUK Conference held at the British Library on 9-11 March 2016. The slides are available and the Twitter hashtag from the event was #rluk16

The talk centred around a debate piece written with my long standing collaborator, Dr Mary Anne Kennan, published in August 2015: Open Access: The Whipping Boy for Problems in Scholarly Publishing. This original 10,000 word article was the starting point for a debate where five people provided rebuttals to our position and we were then given the opportunity to write a rejoinder to these. All the articles were published together.

I have included a précis of the article below as Annex 1, but that is not what the talk was about – what I wanted to discuss was the unexpected progression of the piece and what that revealed to us as authors working in Scholarly Communication.

After we submitted the original piece we sent through several suggestions (including names and contact details) to the Editor for people who might want to contribute. These primarily included practitioners in the Open Access space:

  • Funders
  • Library staff
  • Research managers
  • Editors
  • Publishers
  • Policy makers

There was considerable difficulty in locating people who were prepared to contribute. We are still unsure why this was the case – it may have been a time issue, the fact that this was an academic publication and we were asking administrative professionals, or that it was potentially politically sensitive. On the Editor’s suggestion we sent some personal requests to contacts to ask them to participate. However, in the end four of the five people who wrote rebuttals were researchers in the Information Systems field.

This process made the whole production very protracted. There was a two-year period between the first approach from the journal and publication. The production process from the start of the writing period was 18 months – the actual dates are listed as Annex 2 below.

Same old, same old – the responses

Reading the rebuttals from the four Information Systems researchers, two things become obvious. First, none of them actually addressed the posits we had presented in our original debate piece – which, after all was the point of the exercise.

Second, a theme began to emerge, demonstrated by these snippets:

  • “Before discussing that in detail we need to know what the current situation is regarding OA publishing in IS”
  • “We now discuss four fundamental points regarding scholarly communication. We begin by asking what constitutes the main building blocks of the scholarly communication system”
  • “Before examining the current state of scholarly publication, let us set some parameters for this discussion”
  • “I think the argument would benefit from more systematically analyzing the current system of scholarly publishing…”

In each case the authors chose to undertake their own analysis of scholarly publishing – sometimes apparently unaware that this is a long established area of research.

So what does this tell us?

Lesson 1 – ‘Engagement’ is not working

One thing that was striking about this process was that each contributor came to their own conclusion that Open Access is something we should aim towards. While this is a ‘good thing’ for Open Access advocacy, it is not scalable. If we wait for every researcher to come to their own personal epiphany about Open Access we will never have high levels of uptake.

There has been a long standing belief and practice in Open Access that if the research community were only more aware of the issues in scholarly publishing then they would come on board with Open Access. I am entirely guilty of this myself. However after a decade of trying, it is fairly safe to say that engagement has not worked.

One conclusion to take away from this experience is we must enable the academic community to disseminate their work openly. It must happen around them.

Lesson 2 – The research area of scholarly communication is not well recognised

The concept of an academic discipline is fairly slippery, but it is reasonably safe to say that two things define a discipline – the scholarly literature and language.

Academic ‘communities’ manifest in the form of journals or learned societies. But Scholarly Communication research is traditionally discussed either in a disciplinary specific way in a disciplinary journal (such as part of an editorial), or are published in journals in the sociology of science, communication, librarianship or the information sciences disciplines.

There are two journals that do specifically look at Scholarly Communication – the Journal of Librarianship and Scholarly Communication and Scholarly and Research Communication. I should note that Publications also looks at many issues in this area too.

There are now Offices of Scholarly Communication in universities, especially in the US & increasingly in the UK – the Office of Scholarly Communication at Cambridge being a classic example. However there are no Faculties or Departments or Professorial Chairs of Scholarly Communication in existence – that I can find. I am happy to hear about them if they do exist.

And yet people do undertake research in this area. They publish articles, peer review each other’s work, present at conferences. This is academic work.

It might well be a problem of language. Michael Billing’s book ‘Learn to Write Badly: How to succeed in the social sciences’ makes the argument that creating a language that is impenetrable to others is a way of boundary stamping a discipline.

But in the area of Scholarly Communication, many of the words are vernacular – with common meanings that might be different to their specific meaning in the context of the research. A classic example is ‘publish’ which simply means ‘make public’, but within the academic context means that there has been a process of review and revision, branding and attribution. Words like ‘repository’ and ‘mandate’ have caused me some professional grief.

And we are having some trouble with terminology in the Open Access space with publishers. For example the conflation of ‘deposit’ with ‘make available’ – Wiley instructs authors that they cannot deposit until after the embargo. This is wrong. Authors can deposit whenever they like, as long as they don’t make it available until after the embargo. Green Open Access – which means making a copy of the work freely available – has been rather bizarrely interpreted by Elsevier in their Open Access pages as providing a link to the (subscription) article.

The reason there can be such a high level of inaccuracy around language is because it is not ‘officially’ defined anywhere. I should note that the Consortia Advancing Standards in Research Administration Information (CASRAI) may be doing some work in this area.

Problem 1 – Practice versus study

We concluded in our rebuttal that the practice of scholarly communication (as distinct from the study of it) is shared among all academic fields, librarians, publishers, and administrators. Each of these bring their own levels of understanding, perspectives, and involvement in the scholarly communication system.

This can create a problem because practitioners often think they have a good understanding of the issues surrounding the publication process. But according to a 2012 article in the Journal of Librarianship and Scholarly Communication researchers are generally held to have a low awareness of publishing issues and open access opportunities and are confused over copyright issues.

This is a case of the ‘Unknown Unknowns’ – a term coined (to much ridicule) by Donald Rumsfeld in 2003.

Regardless of where individuals sit, however, in all instances there needs to be a base level of competence in this area. Yes I know, I have just said we should not try and engage academics to convert them to Open Access. However what we should be doing is ensuring they have at least a basic understanding of this area for their own professional wellbeing.

One of the conclusions of my 2008 PhD The effect of scholarly communication practices on engagement with open access: An Australian study of three disciplineswhere I undertook in-depth interviews with 43 researchers about their publication and communication practices – was that the Master/Apprentice system is broken (see pp177 – 188). We are not equipping our researchers with the information they need to navigate the publication process successfully. This need for education was echoed in a 2014 paper about open access journal quality indicators (itself published in the Journal of Librarianship and Scholarly Communication – notice a pattern?)

Problem – library community also needs to know

But this is not just an issue for the research community. Librarians in the academic space also need to know about these issues. Last year the Association of College and Research Libraries (ACRL) released their (excellent) Scholarly Communication Toolkit. The introductory pages note that the “ACRL sees a need to vigorously re-orient all facets of library services and operations to the evolving technologies and models that are affecting the scholarly communication process.” The reason, they say, is because in order for academic libraries to continue to succeed we need to integrate our work into all aspects of the full cycle of scholarly communication.

The toolkit also notes that there is ‘wide variance’ in the levels of understanding of these issues within our community. If we consider the ‘four stages of competence’ as a rough tool:

  1. Unconsciously unskilled – we don’t know that we don’t have this skill, or that we need to learn it.
  2. Consciously unskilled – we know that we don’t have this skill.
  3. Consciously skilled – we know that we have this skill.
  4. Unconsciously skilled – we don’t know that we have this skill (it just seems easy).

It would be an ideal situation to have our academic library community sitting at stages three and four. In reality many are at stage two and even at stage one.

But bringing everyone up to speed is a huge challenge. Our experiences in Australia have demonstrated it is extremely difficult to get issues related to scholarly communication into curricula for library training. Many of the skills in this area are learnt ‘on the job’.

There are almost no courses on repository management as demonstrated in this 2012 study published in the (here it is again) Journal of Librarianship and Scholarly Communication. There is a now slightly out of date list of courses in scholarly communication here. Professor Stephen Pinfield did point out after my talk that he is incorporating open access into his library courses. Discussions about open access are also included at Charles Sturt University in subjects where it is related such as Foundations for information Studies, Collections and Research Data Management, but there has been difficulty in securing a subject explicitly on Open Access or even more broadly on scholarly communication.

Even professional training is limited – CILIP offers ‘Institutional repositories and metadata’ and ‘Digital copyright’ but nothing on publishing or open access. One of the positive outcomes of the conference has been an offer to discuss some of these needs with CILIP.

Solution?

So what is the solution? We must shift from managing the academic literature to participating in the generation of it. Librarians can begin by engaging with the academic literature in their area. Suggestions include:

  • Reading research that is being published (in your area of librarianship)
  • Writing an academic article
  • Presenting work at conferences
  • Offering your services as a peer reviewer
  • Serving on an editorial board
  • Collaborating with your academic community on a project and writing about it

When I suggested this at the conference there was some push-back from the audience, defending the benefits of learning on the job. Afterwards, I was approached by a participant who said she had recently published a paper and found the process incredibly instructive. Interestingly, the same thing happened when a speaker urged colleagues to publish an academic paper at LIBER last year. There was again push-back from the audience until one participant said they seconded her statement. He said he thought he knew all about journals because he worked with them but when he published something he realised ‘I didn’t really know anything about it’.

We might have some way to go.

Annex 1 – The original debate piece

In the original debate piece we provided a background to OA’s development and current state – we did not go into great detail because we were limited by the 10,000 word count and we had made some assumptions about prior knowledge.

The piece examined some of the accusations leveled against OA and described why they were false and indeed indicative of a wider set of problems with scholarly communication:

  • that OA publishers are predatory,
  • that OA is too expensive,
  • that self-depositing papers in OA repositories will bring about the end of scholarly publishing.

We then proposed discussions we considered we should be having about scholarly publishing to take advantage of social and technological innovations and move it into the 21st century. These were the monograph issue, management of APCs, improving institutional repositories, needing to make scholarly publishing inclusive and the reward system.

Annex 2 – The times involved in publication

Here are the dates involved in getting the full debate piece to ‘print’:

  • First approach from the journal – September 2013
  • Agreed to write the piece and first discussion – 10 February 2014
  • Submitted the first argument – 26 May 2014
  • Submitted amendment based on editor’s comments – 29 May 2014
  • Rebuttals sent to us – 18 November 2014
  • Deadline for rejoinder – 19 December 2014
  • Rejoinder sent (!) – 16 February 2015
  • “Publication is with the production editor and will be out ‘anytime’” email – 6 May 2015
  • Copy editor’s questions sent to us – 4 June 2015
  • Corrected pieces (original & rejoinder) sent to editors – 26 June 2015
  • Date of acceptance – 4 July 2015
  • Date of publication – 17 August 2015

Published 11 March 2016
Written by Dr Danny Kingsley
Creative Commons License

Forget compliance. Consider the bigger RDM picture

The Office of Scholarly Communication sent Dr Marta Teperek, our Research Data Facility Manager to the  International Digital Curation Conference held in in Amsterdam on 22-25 February 2016. This is her report from the event.

Fantastic! This was my first IDCC meeting and already I can’t wait for next year. There was not only amazing content in high quality workshops and conference papers, but also a great opportunity to network with data professionals from across the globe. And it was so refreshing to set aside our UK problem of compliance with data sharing policies, to instead really focus on the bigger picture: why it is so important to manage and share research data and how to do it best.

Three useful workshops

The first day started really intensely – the plan was for one full day or two half-day workshops, but I managed to squeeze in three workshops in one day.

Context is key when it comes to data sharing

The morning workshop was entitled “A Context-driven Approach to Data Curation for Reuse” by Ixchel Faniel (OCLC), Elizabeth Yakel (University of Michigan), Kathleen Fear (University of Rochester) and Eric Kansa (Open Context). We were split into small groups and asked to decide what was the most important information about datasets from the re-user’s point of view. Would the re-user care about the objects themselves? Would s/he want to get hints about how to use the data?

We all had difficulties in arranging the necessary information in order of usefulness. Subsequently, we were asked to re-order the information according to the importance from the point of view of repository managers. And the take-home message was that for all of the groups the information about datasets required by the re-user was the not same as that required from the repository.

In addition, the presenters provided discipline-specific context based on interviews with researchers – depending on the research discipline, different information about datasets was considered the most important. For example, for zoologists, the information about specimen was very important, but it was of negligible importance to social scientists. So context is crucial for the collection of appropriate metadata information. Insufficient contextual information makes data not useful.

So what can institutional repositories do to address these issues? If research carried out within a given institution only covers certain disciplines, then institutional repositories could relatively easily contextualise metadata information being collected and presented for discovery. However, repositories hosting research from many different disciplines will find this much more difficult to address. For example, Cambridge repository has to host research spanning across particle physics, engineering, economics, archaeology, zoology, clinical medicine and many, many others. This makes it much more difficult (if not impossible) to contextualise the metadata.

It is not surprising that information most important from the repository’s point of view is different that the most important information required by the data re-users. In order to ensure that research data can be effectively shared and preserved in long term, repositories need to collect certain amount of administrative metadata: who deposited the data, what are the file formats, what are the data access conditions etc. However, repositories should collect as much administrative metadata as possible in an automated way. For example, if the user logs in to deposit data, all the relevant information about the user should be automatically harvested by feeds from human resources systems.

EUDAT – Pan-European infrastructure for research data

The next workshop was about EUDAT – the collaborative Pan-European infrastructure providing research data services, training and consultancy for researchers. EUDAT is an impressive project funded by Horizon2020 grant and it offers five different types of services to researchers:

  • B2DROP – a secure and trusted data exchange service to keep research data synchronized, up-to-date and easy to exchange with other researchers;
  • B2SHARE – service for storing and sharing small-scale research data from diverse contexts;
  • B2SAFE – service to safely store research data by replicating it and depositing at multiple trusted repositories (additional data backups);
  • B2STAGE – service to transfer datasets between EUDAT storage resources and high-performance computing (HPC) workspaces;
  • B2FIND – discovery service harvesting metadata from research data collections from EUDAT data centres and other repositories.

The project has a wide range of services on offer and is currently looking for institutions to pilot these services with. I personally think these are services which (if successfully implemented) would be of a great value to Pan-European research community.

However, I have two reservations about the project:

  • Researchers are being encouraged to use EUDAT’s platforms to collaborate on their research projects and to share their research data. However, the funding for the project runs out in 2018. EUDAT team is now investigating options to ensure the sustainability and future funding for the project, but what will happen to researchers’ data if the funding is not secured?
  • Perhaps if the funding is limited it would be more useful to focus the offering on the most useful services, which are not provided elsewhere. For example, another EC-funded project, Zenodo, already offers a user-friendly repository for research data; Open Science Framework offers a platform for collaboration and easy exchange of research data. Perhaps EUDAT could initially focus on developing services which are not provided elsewhere. For example, having a Pan-Europe service harvesting metadata from various data repositories and enabling data discovery is clearly much needed and would be extremely useful to have.

Jisc Shared RDM Services for UK institutions

I then attended the second half of Jisc workshop on shared Research Data Management services for UK institutions. The University of York and the University of Cambridge are two of 13 pilot institutions participating in the pilot. Jenny Mitcham from York and I gave presentations on our institutional perspectives on the pilot project: where we are at the moment and what are our key expectations from the pilot. Jenny gave an overview of an impressive work by her and her colleagues on addressing data preservation gaps at the University of York. Data preservation was one of the areas in which Cambridge hopes to get help from the Jisc RDM shared services project. Additionally, as we described before, Cambridge would greatly benefit from solutions for big data and for personal/sensitive data. My presentation from the session is available here.

Presentations were followed by breakout group discussions. Participants were asked to identify the areas of priorities for the Jisc RDM pilot. The top priority identified by all the groups seemed to be solutions for personal/sensitive data and for effective data access management. This was very interesting to me as at similar workshops held by Jisc in the UK, breakout groups prioritised interoperability with their existing institutional systems and cost-effectiveness. This could be one of the unforeseen effects of strict funders’ research data policies in the UK, which required institutions to provide local repositories to share research data.

As a result of these policies, many institutions were tasked with creating institutional data repositories from scratch in a very short time. Most of the UK universities now have institutional repositories which allow research data to be uploaded and shared. However, very few universities have their repositories well integrated with other institutional systems. Not having the policy pressure in non-UK countries perhaps allowed institutions to think more strategically about developing their RDM service provisions and ensure that developed services are well embedded within the existing institutional infrastructure.

Conference papers and posters

The two following days were full of excellent talks. My main problem was which sessions to attend: talking with other attendees I am aware that the papers presented at parallel sessions were also extremely useful. If the budget allows, I certainly think that it would be useful for more participants from each institution to attend the meeting to cover more parallel sessions.

Below are my main reflections from keynote talks.

Barend Mons – Open Science as a Social Machine

This was a truly inspirational talk, raising a lot of thought-provoking discussions. Barend started from a reflection that more and more brilliant brains, with more and more powerful computers and with billions of smartphones, created a single, interconnected social super-machine. This machine generates data – vast amount of data – which is difficult to comprehend and work with, unless proper tools are used.

Barend mentioned that with the current speed of new knowledge being generated and papers being published, it is simply impossible for human brains to assimilate the constantly expanding amount of new knowledge. Brilliant brains need powerful computers to process the growing amount of information. But in order for science to be accessible to computers, we need to move away from pdfs. Our research needs to be machine-readable. And perhaps if publishers do not want to support machine-readability, we need to move away from the current publishing model.

Barend also stressed that if data is to be useful and correctly interpretable, it needs to be accessible not only to machines, but also to humans, and that effort is needed to make data well described. Barend said that research data without proper metadata description is useless (if not harmful). And how to make research data meaningful? Barend proposed a very compelling solution: no more research grants should be awarded without 5% of money dedicated for data stewardship.

I could not agree more with everything that Barend said. I hope that research funders will also support Barend’s statement.

Andrew Sallans – nudging people to improve their RDM practice

Andrew started his talk from a reflection that in order to improve our researchers’ RDM practice we need to do better than talking about compliance and about making data open. How a researcher is supposed to make data accessible, if the data was not properly managed in the first place? The Open Science Framework has been created with three mission statements:

  • Technology to enable change;
  • Training to enact change;
  • Incentives to embrace change.

So what is the Open Science Framework (OSF)? It is an open source platform to support researchers during the entire research lifecycle: from the start of the project, through data creation, editing and sharing with collaborators and concluding with data publication. What I find the most compelling about the OSF is that is allows one to easily connect various storage platforms and places where researchers collaborate on their data in one place: researchers can easily plug their resources stored on Dropbox, Googledrive, GitHub and many others.

To incentivise behavioural change among researchers, the OSF team came up with two other initiatives:

Personally, I couldn’t agree more with Andrew that enabling good data management practice should be the starting point. We can’t expect researchers to share their research data if we have not helped them with providing tools and support for good data management. However, I am not so sure about the idea of cash rewards.

In the end researchers become researchers because they want to share the outcomes of their research with the community. This is the principle behind academic research – the only way of moving ideas forward is to exchange findings with colleagues. Do researchers need to be paid extra to do the right thing? I personally do not think so and I believe that whoever decides to pursue an academic career is prepared to share. And it is our task to make data management and sharing as easy as possible, and the use of OSF will certainly be of a great aid for the community.

Susan Halford – the challenge of big data and social research

The last keynote was from Susan Halford. Susan’s talk was again very inspirational and thought-provoking. She talked about the growing excitement around big data and how trendy it has become; almost being perceived as a solution to every problem. However, Susan also pointed out the problems with big data. Simply increasing the computational power and not fully comprehending the questions and the methodology used can lead to serious misinterpretations of results. Susan concluded that when doing big data research one has to be extremely careful about choosing proper methodology for data analysis, reflecting on both the type of data being collected, as well as (inter)disciplinary norms.

Again – I could not agree more. Asking the right question and choosing the right methodology are key to make the right conclusions. But are these problems new to big data research? I personally think that we are all quite familiar with these challenges. Questions about the right experimental design and the right methodology have been known to humankind since scientific method is used.

Researchers always needed to design studies carefully before commencing to do the experiments: what will be the methodology, what are the necessary controls, what should be the sample size, what needs to happen for the study to be conclusive? To me this is not a problem of big data, to me this is a problem that needs to be addressed by every researcher from the very start of the project, regardless of the amount of data the project generates or analyses.

Birds of a Feather discussions

I had not experienced Birds of a Feather Discussions (BoF) before at a conference and I am absolutely amazed by the idea. Before the conference started the attendees were invited to propose ideas for discussions keeping in mind that BoF sessions might have the following scope:

  • Bringing together a niche community of interest;
  • Exploring an idea for a project, a standard, a piece of software, a book, an event or anything similar.

I proposed a session about sharing of personal/sensitive data. Luckily, the topic was selected for a discussion and I co-chaired the discussion together with Fiona Nielsen from Repositive. We both thought that the discussion was great and our blog post from the session is available here.

And again, I was very sorry to be the only attendee from Cambridge at the conference. There were four parallel discussions and since I was chairing one of them, I was unable to take part in the others. I would have liked to be able to participate in discussions on ‘Data visualisation’ and ‘Metadata Schemas’ as well.

Workshops: Appraisal, Quality Assurance and Risk Assessment

The last day was again devoted to workshops. I attended an excellent workshop from the Pericles project on the appraisal, quality assurance and risk assessment in research data management. The project was about how an institutional repository should conduct data audits when accepting data deposits and also how to measure the risks of datasets becoming obsolete.

These are extremely difficult questions and due to their complexity, very difficult to address. Still, the project leaders realised the importance of addressing them systematically and ideally in an (semi)automated way by using specialised software to help repository managers making the right preservation decisions.

In a way I felt sorry for the presenters – their project progress and ambitions were so high that probably none of us, attendees, were able to critically contribute to the project – we were all deeply impressed by the high level of questions asked, but our own experience with data preservation and policy automation was nowhere at the level demonstrated by the workshop leaders.

My take home message from the workshop is that proper audit of ingested data is of crucial importance. Even if there is no automation of risk assessment possible, repository managers should at least collect information about files being deposited to be able to assess the likelihood of their obsolescence in the future. Or at least to be able to identify key file formats/software types as selected preservation targets to ensure that the key datasets do not become obsolete. For me the workshop was a real highlight of the conference.

Networking and the positive energy

Lots of useful workshops, plenty of thought-provoking talks. But for me one of the most important parts of the conference was meeting with great colleagues and having fascinating discussions about data management practices. I never thought I could spend an evening (night?) with people who would be willing to talk about research data without the slightest sights of boredom. And the most joyful and refreshing part of the conference was that due to the fact we were from across the globe, our discussions diverted away from the compliance aspect of data policies. Free from policy, we were able to address issues of how to best support research data management: how to best help researchers, what are our priority needs, what data managers should do first with our limited resources.

I am looking forward to catching up next year with all the colleagues I have met in Amsterdam and to see what progress we will have all made with our projects and what should be our collective next moves.

Summarising, I came back with lots of new ideas and full of energy and good attitude – ready to advocate for the bigger picture and the greater good. I came back exhausted, but I cannot imagine spending four days any more productively and fruitfully than at IDCC.

Thanks so much to the organisers and to all the participants!

Published 8 March 2016
Written by Dr Marta Teperek

Creative Commons License