Tag Archives: open data

FORCE2015 observations & notes

This blog first appeared on the FORCE2015 website on the 14 January 2015

First a disclaimer. This blog is not an attempt to summarise everything that happened at FORCE2015 – I’ll leave that to others. The Twitter feed using #FORCE2015 contains an interesting side discussion, and the event was livestreamed with individual sessions live in two weeks here – so you can always check bits out for yourself.

So this is a blog about the things that I as a researcher in scholarly communication working in university administration (with a nod to my previous life as a science communicator) found interesting. This is a small representative of the whole.

This was my first FORCE event, which has occurred annually since the first event FORCE11 , which happened in August 2011 after a “Beyond the pdf” workshop in January that year. It was nice to have such a broad group of attendees. There were researchers and innovators (and often people were both), research funders, publishers, geeks and administrators all sharing their ideas. Interestingly there were only a few librarians – this, in itself makes this conference stand out. Sarah Thomas, Vice President of Harvard Library observed this, noting she is shocked that there are usually only librarians at the table at these sort of events.

To give an idea of the group – when the question was asked about who had received a grant from the various funding bodies, I was in a small minority by not putting up my hand. These are actively engaged researchers.

I am going to explore some of the themes of the conference here, including:

  • Library issues
  • The data challenge
  • New types of publishing
  • Wider scholarly communication issues, and
  • The impenetrability of scientific literature

Bigger questions about effecting change

Responsibility

Whose responsibility is it to effect change in the scholarly communication space? Funders say they are looking to the community for direction. Publishers are saying they are looking to authors and editors for direction. Authors are saying they are looking to find out what they are supposed to do. We are all responsible. Funding is not the domain of the funders, it is interdependent.

What is old is still old

The Journal Incubator team asked the editorial board members of the new journal “Culture of Community” to identify what they thought will attract people to their journal. None mentioned the modern and open technology of their publishing practices. All points they identified were traditional, such as: peer review, high indexing, pdf formatting etc. Take home message – Authors are not interested in the back-end technology of a journal, they just want the thing to work. This underlies the need to ENABLE not ENGAGE.

The way forward

The way forward is three fold, and incorporates: Community – Policy – Infrastructure. Moving forward we will require initiatives focused on: Sustainability, Collaboration and Training.

Library issues

Future library

Sarah Thomas, the Vice President of the Harvard Library spoke about “Libraries at Scale or Dinosaurs Disrupted”. She had some very interesting things to say about the role of the library into the future:

  • Traditional libraries are not sustainable. Acquisition, cataloguing and storage of publications doesn’t scale.
  • We need to operate at scale, and focus on this centuries’ challenges not last, by developing new priorities and reallocate resources to them. Use approaches that dramatically increase outputs.
  • There is very little outreach of the libraries into the community –  we are not engaging broadly expect “we are the experts and you come to us and we will tell you what to do”.
  • We must let go of our outdated systems – such as insufficiently automated practices, redundant effort, ‘just in case coverage’.
  • We must let go of our outdated practices – a competitive, proprietary approach. We need to engage collaborators to advance goals.
  • Open up hidden collection and maximise access to what we have.
  • Start doing research into what we have and illuminate the contents in ways we never could in a manual world, using visualization and digital tools

Future library workforce

There was also some discussion about the skils a future library worksforce needs to have:

  • We need an agile workforce – skills training, data science social media etc – help promote the knowledge of quality to work. Put it in performance goals.
  • We need to invest in 21st century skillsets. the workforce we should be hiring includes:
    • Metadata librarian
    • Online learning librarians
    • Bioinformatics librarians
    • GIS specialist
    • Visualization librarian
    • Copyright advisor
    • Senior data research specialist
    • Data curation experts
    • Scholarly communications librarian
    • Quantitative data specialist
    • Faculty technology specialist
    • Subject specialist
Possible solution?
The Council on LIbrary and Information Resources offers PostDoc Fellowships: CLIR Postdoctoral Fellows work on projects that forge and strengthen connections among library collections, educational technologies and current research. The program offers recent PhD graduates the chance to help develop research tools, resources, and services while exploring new career opportunities.

Possible opportunity to observe change?

In summing up the conference Phil Bourne said there is an upcoming major opportunity point – both the European Bioinformatics Institute in EU and the National Library of Medicine in US will soon assume new leadership. They are receiving recommendations on what the institution of the future should look like.

The library has a tradition of supporting the community, being an infrastructure to maintain knowledge, and in the case of National Library of Medicine to set policy. If they are going to reinvent this institution we need to watch what will it look like in the future.

The future library (or whatever it will be called) should curate, catalog, preserve and disseminate the complete digital research lifecycle. This is something we need to move towards. The fact that there is an institution that might move towards this is very exciting.

The data challenge

Data was discussed at many points during the conference, with some data solutions/innovations showcased:

  • Harvard has the Harvard Dataverse Network– a repository to share data. “Data Management at Harvard” – Harvard Guidelines and Policies cranking up investment in managing data LINK
  • The Resource Identification Initiative is designed to help researchers sufficiently cite the key resources used to produce the scientific findings reported in the biomedical literature.
  • Bio Caddie is trying to do for data what PubMed central has done for publications using a Data Discovery Index. The goal of this project is to engage a broad community of stakeholders in the development of a biomedical and healthCAre Data Discovery and Indexing Ecosystem (bioCADDIE).

The National Science Foundation data policy

Amy Frielander spoke about The Long View. She posed some questions:

  • Must managing data be collated with storing the data?
  • What gets access to what and when?
  • Who and what can I trust?
  • What do we store it in? Where do we put things, where do they need to be?

The NSF don’t make a policy for each institution, they make one NSF Data Sharing Policy that works more or less well across all disciplines. There is a diversity of sciences with heterogeneous research results. Range of institutions, professional societies, stewardship institutions and publishers, and multiple funding streams.

There are two contact points – when grant is awarded, and when they report. If we focus on publications we can develop the architecture to extend to other kinds of research products. Integrate the internal systems within the enterprise architecture to minimise burden on investigators and program staff.

Take home message: The great future utopia (my word) is: We want to upload once to use many times. We want an environment in which all publications are linked to the underlying evidence (data) analytical tools, and software.

New types of publishing

There were several publishing innovations showcased.

Journal Incubator

The University of Lethbridge has a ‘journal incubator’ which was developed with the goal of sustaining scholarly communication and open and digital access. It allows the incubator to train graduate students in the task of journal editorships so the journal can be provided completely free of charge.

F1000 Research Ltd – ‘living figures’

Research is an ongoing activity but the way we publish you wouldn’t think it was. It is still very much around the static print object. The F1000 LINK has the idea that data is embedded in the article – developed a tool that allows you to see what is on the article.

Many figures don’t need to exist – you need the underlying data. Living figures in the paper. Research labs can submit data directly on top of the figure – to see if it was reproducible or not. This provides interesting potential opportunities –bringing static reseach figures to life – a “Living collection” Can have articles in different labs around that data. The tools and technologies are out there already.

Collabra – giving back to the community

New University California Open Press journal, Collabra will share a proportion of APC with researchers and reviewers. Of the $875 APC, $250 goes into the fund. Editors and reviewers get money into the fund, and there is a payout to the research community – they can decide what to do with it. Choices are to:

  • Receive it electronically
  • Pay it forward to pay APCS in future
  • Pay it forward to institution’s OA fund.

This is a journal where reviewers get paid  – or can elect to pay themselves. See if everyone can benefit from the journal. No lock-in – benefit through partnerships.

Future publishing – a single XML file

Rather than replicating old publishing processes electronically, the dream is we have one single XML file in the cloud. There is role-based access to modify the work (by editors, reviewers etc) then at the end that version is the one that gets published. Everything is in the XML and then automatic conversion at the end.  References at the click of a button are completely structured XML – tags are coloured. Can track the changes. The journal editor gets a deep link to say something to look at. Can accept or reject. XML can convert to a pdf – high level typography, fully automatically.

Wider scholarly communication issues

This year is the 350th anniversary of the first scientific journal* Philosophical Transactions of the Royal Society. Oxford holds a couple of copies of this journal and there was an excursion for those interested in seeing it.

It is a good time to look at the future.

Does reproducibility matter?

Something that was widely discussed was the question of whether research should be reproducible,which raised the following points:

  • The idea of a single well defined scientific method and thus an incremental, cumulative, scientific process is debatable.
  • Reproducibility and robustness are slightly different. Robustness of the data may be key.
  • There are no standards with a computational result that can ensure we have comparable experiments.
Possible solution?

Later in the conference a new service that tries to address the diversity of existing lab software was showcased – Riffyn. It is a cloud based software platform – a CAD for experiments. The researcher has a unified experimental view of all their processes and their data. Researchers can update it themselves – not reliant on IT staff.

Credit where credit is due

I found the reproducibility discussion very interesting, as was the discussion about authorship and attribution which posed the following:

  • If it is an acknowledgement system everyone should be on it
  • Authorship is a proxy for scientific responsibility. We are using the wrong word.
  • When crediting research we don’t make distinctions between contributions. Citation is not the problem, contribution is.
  • Which building blocks of a research project do we not give credit for? And which ones only get indirect credit? How many skills would we expect one person to have?
  • The problem with software credit is we are not acknowledging the contributors, so we are breaking the reward mechanism
  • Of researchers in research-intensive universities, 92% are using software. Of those 69% say their work would be impossible without software. Yet 71% of researchers have no formal software development training. We need standard research computer training.
Possible solutions
  • The Open Science Framework  –  provides documentation for the whole research process. This therefore determines how credit should be apportioned.
  • Project CRediT has come up with a taxonomy of terms. Proposing take advantage of an infrastructure that already exists. Using Mozilla OpenBadges – if you hear or see the word ‘badges’ think ‘Digital Credit’

The impenetrability of scientific literature

Astrophysicist Chris Lintott discussed citizen science, specifically the phenomenally successful programGalaxyZoo which taps into a massive group of interested amateur astronomers to help classify galaxies in terms of their shape. This is something that humans do better than machines.

What was interesting was the challenge that Chris identified – amateur astronomers become ‘expert’ amateurs quickly and the system has built ways of them to communicate with each other and with scientists. The problem is that the astronomical literature is simply impenetrable to these (informed) citizens.

The scientific literature is the ‘threshold fear’ for the public. This raises some interesting questions about access – and the need for some form of moderator. One suggestion is some form of lay summary of the research paper – PLOS Medicine have an editor’s summary for papers. (Nature do this for some papers, and BMJ are pretty strong on this too).

Take home message – By designing a set of scholarly communication tools for citizen scientists we improve the communication for all scientists. This is an interesting way to think about how we want to browse scholarly papers as researchers ourselves.

*Yes I know that the French Journal des scavans was published before this, but it was boarder in focus, so hence the qualifier ‘first scientific journal”
Published 18 March 2015
Written by Dr Danny Kingsley
Creative Commons License