Tag Archives: Open Research

Walking the talk- reflections on working ‘openly’

As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Dr Lauren Cadwallader discusses her experience of researching openly.

Earlier this year I was awarded the first Altmetric.com Annual Research grant to carry out a proof-of-concept study looking at using altmetrics as a way of identifying journal articles that eventually get included into a policy document. As part of the grant condition I am required to share this work openly. “No problem!” I thought, “My job is all about being open. I know exactly what to do.”

However, it’s been several years since I last carried out an academic research project and my previous work was carried out with no idea of the concept of open research (although I’m now sharing lots of it here!). Throughout my project I kept a diary documenting my reflections on being open (and researching in general) – mainly the mistakes I made along the way and the lessons I learnt. This blog summarises those lessons.

To begin at the beginning

I carried out a PhD at Cambridge not really aware of scholarly best practice. The Office of Scholarly Communication didn’t exist. There wasn’t anyone to tell me that I should share my data. My funder didn’t have any open research related policies. So I didn’t share because I didn’t know I could, or should, or why I would want to.

I recently attended The Data Dialogue conference and was inspired to hear many of the talks about open data but also realised that although I know some of the pitfalls researchers fall into I don’t quite feel equipped to carry out a project and have perfectly open and transparent methods and data at the end. Of course, if I’d been smart enough to attend an RDM workshop before starting my project I wouldn’t feel like this!

My PhD supervisor and the fieldwork I carried out had instilled in me some practices that are useful to carrying out open research:.

Lesson #1. Never touch your raw data files

This is something I learnt from my PhD and found easy to apply here. Altmetric.com sent me the data I requested for my project and I immediately saved it as the raw file and saved another version as my working file. That made it easy when I came to share my files in the repository as I could include the raw and edited data. Big tick for being open.

Getting dirty with the data

Lesson #2. Record everything you do

Another thing I was told to do during my PhD lab work was to record everything you do. And that is all well and good in the lab or the field but what about when you are playing with your data? I found I started cleaning up the spreadsheet Altmetric.com sent and I went from having 36 columns to just 12 but I hadn’t documented my reasons for excluding large swathes of data. So I took a step back and filled out my project notebook explaining my rationale. Documenting every decision at the time felt a little bit like overkill but if I need to articulate my decisions for excluding data from my analysis in the future (e.g. during peer review) then it would be helpful to know what I based my reasoning on.

Lesson #3. Date things. Actually, date everything

I’d been typing up my notes about why some data is excluded and others not so it informs my final data selection and I’d noticed that I’d been making decisions and notes as I go along but not recording when. If I’m trying to unpick my logic at a later date it is helpful if I know when I made a decision. Which decision came first? Did I have all my ‘bright ideas’ on the same day and now the reason they don’t look so bright is was because I was sleep deprived (or hungover in the case of my student days) and not thinking straight. Recording dates is actually another trick I learnt as a student – data errors can be picked up as lab or fieldwork errors if you can work back and see what you did when – but have forgotten to apply thus far. In fact, it was only at this point that I began dating my diary entries…

Lesson #4. A tidy desk(top) is a tidy mind

Screen Shot 2016-10-24 at 13.21.11I was working on this project just one day a week over the summer so every week I was having to refresh my mind as to where I stopped the week before and what my plans were that week. I was, of course, now making copious notes about my plans and dating decisions so this was relatively easy. However, upon returning from a week’s holiday, I opened my data files folder and was greeted by 10 different spreadsheets and a few other files. It took me a few moments to work out which files I needed to work on, which made me realise I needed to do some housekeeping.

Aside from making life easier now, it will make the final write up and sharing easier if I can find things and find the correct version. So I went from messy computer to tidy computer and could get back to concentrating on my analysis rather than worrying if I was looking at the right spreadsheet.

 

Lesson #5. Version control

One morning I had been working on my data adding in information from other sources and everything was going swimmingly when I realised that I hadn’t included all of my columns in my filters and now my data was all messed up. To avoid weeping in my office I went for a cup of tea and a biscuit.

Upon returning to my desk I crossed my fingers and managed to recover an earlier version of my spreadsheet using a handy tip I’d found online. Phew! I then repeated my morning’s work. Sigh. But at least my data was once again correct. Instead of relying on handy tips discovered by frantic Googling, just use version control. Archive your files periodically and start working on a new version. Tea and biscuits cannot solve everything.

Getting it into the Open

After a couple more weeks of problem free analysis it was time to present my work as a poster at the 3:AM Altmetrics conference. I’ve made posters before so that was easy. It then dawned on me at about 3pm the day I needed to finish the poster that perhaps I should share a link to my data. Cue a brief episode of swearing before realising I sit 15ft away from our Research Data Advisor and she would help me out! After filling out the data upload form for our institutional repository to get a placeholder record and therefore DOI for my data, I set to work making my spreadsheet presentable.

Lesson #6. Making your data presentable can be hard work if you are not prepared

I only have a small data set but it took me a lot longer than I thought it would to make it sharable. Part of me was tempted just to share the very basic data I was using (the raw file from Altmetric.com plus some extra information I had added) but that is not being open to reproducibility. People need to be able to see my workings so I persevered.

I’d labelled the individual sheets and the columns within those sheets in a way that was intelligible to me but not necessarily to other people so they all needed renaming. Then I had to tidy up all the little notes I’d made in cells and put those into a Read Me file to explain some things. And then I had to actually write the Read Me file and work out the best format for it (a neutral text file or pdf is best).

I thought I was finished but as our Research Data Advisor pointed out, my spreadsheets were returning a lot of errors because of the formula I was using (it was taking issue with me asking it to divide something by 0) and that I should share one file that included the formulae and one with just the numbers.

If I’d had time, I would have gone for a cup of tea and a biscuit to avoid weeping in the office but I didn’t have time for tea or weeping. Actually producing a spreadsheet without formulae turned out to be simple once I’d Googled how to do it and then my data files were complete. All I then needed to do was send them to the Data team and upload a pdf of my poster to the repository. Job done! Time to head to the airport for the conference!

Lesson #7. Making your work open is very satisfying.

Just over three weeks have passed since the conference and I’m amazed that already my poster has been viewed on the repository 84 times and my data has been viewed 153 times! Wowzers! That truly is very satisfying and makes me feel that all the effort and emergency cups of tea were worth it. As this was a proof-of-concept study I would be very happy for someone to use my work, although I am planning to keep working on it. Seeing the usage stats of my work and knowing that I have made it open to the best of my ability is really encouraging for the future of this type of research. And of course, when I write these results up with publication in mind it will be as an open access publication.

But first, it’s time for a nice relaxed cup of tea.

Published 25 October 2016
Written by Dr Lauren Cadwallader
Creative Commons License

Making the connection: research data network workshop

During International Data Week 2016, the Office of Scholarly Communication is celebrating with a series of blog posts about data. The first post was a summary of an event we held in July. This post looks at the challenges associated with financially supporting RDM training.

corpus-main-hallFollowing the success of hosting the Data Dialogue: Barriers to Sharing event  in July we were delighted to welcome the Research Data Management (RDM) community to Cambridge for the second Jisc research data network workshop. The event was held in Corpus Christi College with meals held in the historical dining room. (Image: Corpus Christi )

RDM services in the UK are maturing and efforts are increasingly focused on connecting disparate systems, standardising practices and making platforms more usable for researchers. This is also reflected in the recent Concordat on Research Data which links the existing statements from funders and government, providing a more unified message for researchers.

The practical work of connecting the different systems involved in RDM is being led by the Jisc Research Data Shared Services project which aims to share the cost of developing services across the UK Higher Education sector. As one of the pilot institutions we were keen to see what progress has been made and find out how the first test systems will work. On a personal note it was great to see that the pilot will attempt to address much of the functionality researchers request but that we are currently unable to fully provide, including detailed reporting on research data, links between the repository and other systems, and a more dynamic data display.

Context for these attempts to link, standardise and improve RDM systems was provided in the excellent keynote by Dr Danny Kingsley, head of the Office of Scholarly Communication at Cambridge, reminding us about the broader need to overhaul the reward systems in scholarly communications. Danny drew on the Open Research blogposts published over the summer to highlight some of the key problems in scholarly communications: hyperauthorship, peer review, flawed reward systems, and, most relevantly for data, replication and retraction. Sharing data will alleviate some of these issues but, as Danny pointed out, this will frequently not be possible unless data has been appropriately managed across the research lifecycle. So whilst trying to standardise metadata profiles may seem irrelevant to many researchers it is all part of this wider movement to reform scholarly communication.

Making metadata work

Metadata models will underpin any attempts to connect repositories, preservation systems, Current Research Information Systems (CRIS), and any other systems dealing with research data. Metadata presents a major challenge both in terms of capturing the wide variety of disciplinary models and needs, and in persuading researchers to provide enough metadata to make preservation possible without putting them off sharing their research data. Dom Fripp and Nicky Ferguson are working on developing a core metadata profile for the UK Research Data Discovery Service. They spoke about their work on developing a community-driven metadata standard to address these problems. For those interested (and Git-Hub literate) the project is available here.

They are drawing on national and international standards, such as the Portland Common Data Model, trying to build on existing work to create a standard which will work for the Shared Services model. The proposed standard will have gold, silver and bronze levels of metadata and will attempt to reward researchers for providing more metadata. This is particularly important as the evidence from Dom and Nicky’s discussion with researchers is that many researchers want others to provide lots of metadata but are reluctant to do the same themselves.

We have had some success with researchers filling in voluntary metadata fields for our repository, Apollo, but this seems to depend to a large extent on how aware researchers are of the role of metadata, something which chimes with Dom and Nicky’s findings. Those creating metadata are often unaware of the implications of how they fill in fields, so creating consistency across teams, let alone disciplines and institutions can be a struggle. Any Cambridge researchers who wish to contribute to this metadata standard can sign up to a workshop with Jisc in Cambridge on 3rd October.

Planning for the long-term

A shared metadata standard will assist with connecting systems and reducing researchers’ workload but if replicability, a key problem in scholarly communications, is going to be possible digital preservation of research data needs to be addressed. Jenny Mitcham from the University of York presented the work she has been undertaking alongside colleagues from the University of Hull on using Archivematica for preserving research data and linking it to pre-existing systems (more information can be found on their blog.)

Jenny highlighted the difficulties they encountered getting timely engagement from both internal stakeholders and external contractors, as well as linking multiple systems with different data models, again underlining the need for high quality and interoperable metadata. Despite these difficulties they have made progress on linking these systems and in the process have been able to look into the wide variety of file formats currently in use at York. This has lead to conversations with the National Archive about improving the coverage of research file formats in PRONOM (a registry of file formats for preservation purposes), work which will be extremely useful for the Shared Services pilot.

In many ways the project at York and Hull felt like a precursor to the Shared Services pilot; highlighting both the potential problems in working with a wide range of stakeholders and systems, as well as the massive benefits possible from pooling our collective knowledge and resources to tackle the technical challenges which remain in RDM.

Published 14 September 2016
Written by Rosie Higman
Creative Commons License

Could Open Research benefit Cambridge University researchers?

This blog is part of the recent series about Open Research and reports on a discussion with Cambridge researchers  held on 8 June 2016 in the Department of Engineering. Extended notes from the meeting and slides are available at the Cambridge University Research Repository. This report is written by  Lauren Cadwallader, Joanna Jasiewicz and Marta Teperek (listed alphabetically by surname).

At the Office of Scholarly Communication we have been thinking for a while about Open Research ideas and about moving beyond mere compliance with funders’ policies on Open Access and research data sharing. We thought that the time has come to ask our researchers what they thought about opening up the research process and sharing more: not only publications and research data, but also protocols, methods, source code, theses and all the other elements of research. Would they consider this beneficial?

Working together with researchers – democratic approach to problem-solving

To get an initial idea of the expectations of the research community in Cambridge, we organised an open discussion hosted at the Department of Engineering. Anyone registering was asked three questions:

  • What frustrates you about the research process as it is?
  • Could you propose a solution that could solve that problem?
  • Would you be willing to speak about your ideas publicly?

20160608_163000Interestingly, around fifty people registered to take part in the discussion and almost all of them contributed very thought-provoking problems and appealing solutions. To our surprise, half of the people expressed their will to speak publicly about their ideas. This shaped our discussion on the day.

So what do researchers think about Open Research? Not surprisingly, we started from an animated discussion about unfair reward systems in academia.

Flawed metrics

A well-worn complaint: the only thing that counts in academia is publication in a high impact journal. As a result, early career researchers have no motivation to share their data and to publish their work in open access journals, which can sometimes have lower impact factors. Additionally, metrics based on the whole journal do not reflect the importance of the research described: what is needed is article-level impact measurements. But it is difficult to solve this systemic problem because any new journal which wishes to introduce a new metrics system has no journal-level impact factor to start with, and therefore researchers do not want to publish in it.

Reproducibility crisis: where quantity, not quality, matters

Researchers also complained that the volume of produced research is higher and higher in terms of quantity and science seems to have entered an ‘era of quantity’. They raised the concern that the quantity matters more than the quality of research. Only the fast and loud research gets rewarded (because it is published in high impact factor journals), and the slow and careful seems to be valued less. Additionally, researchers are under pressure to publish and they often report what they want to see, and not what the data really shows. This approach has led to the reproducibility crisis and lack of trust among researchers.

Funders should promote and reward reproducible research

The participants had some good ideas for how to solve these problems. One of the most compelling suggestions was that perhaps funding should go not only to novel research (as it seems to be at the moment), but also to people who want to reproduce existing research. Additionally, reproducible research itself should be rewarded. Funders could offer grant renewal schemes for researchers whose research is reproducible.

Institutions should hire academics committed to open research

Another suggestion was to incentivise reward systems other than journal impact factor metrics. Someone proposed that institutions should not only teach the next generation of researchers how to do reproducible research, but also embed reproducibility of research as an employment criteria. Commitment to Open Research could be an essential requirement in job description. Applicants could be asked at the recruitment stage how they achieve the goals of Open Research. LMU University in Munich had recently included such a statement in a job description for a professor of social psychology (see the original job description here and a commentary here).

Academia feeding money to exploitative publishers

Researchers were also frustrated by exploitative publishers. The big four publishers (Elsevier, Wiley, Springer and Informa) have a typical annual profit margin of 37%. Articles are donated to the publishers for free by the academics, and reviewed by other academics, also free of charge. Additionally, noted one of the participants, academics also act as journal editors, which they also do for free.

[*A comment about this statement was made on 15 August 2017 noting that some editors do get paid. While the participant’s comment stands as a record of what was said, we acknowledge that this is not an entirely accurate statement.]

In addition to this, publishers take away the copyright from the authors. As a possible solution to the latter, someone suggested that universities should adopt institutional licences on scholarly publishing (similar to the Harvard licence) which could protect the rights of their authors

Pre-print services – the future of publishing?

Could Open Research aid the publishing crisis? Novel and more open ways of publishing can certainly add value to the process. The researchers discussed the benefits of sharing pre-print papers on platforms like arXiv and bioRxiv. These services allow people to share manuscripts before publication (or acceptance by a journal). In physics, maths and computational sciences it is common to upload manuscripts even before submitting the manuscript to a journal in order to get feedback from the community and have the chance to improve the manuscript.

bioRxiv, the life sciences equivalent of arXiv, started relatively recently. One of our researchers mentioned that he was initially worried that uploading manuscripts into bioRxiv might jeopardise his career as a young researcher. However, he then saw a pre-print manuscript describing research similar to his published on bioRxiv. He was shocked when he saw how the community helped to change that manuscript and to improve it. He has since shared a lot of his manuscripts on bioRxiv and as his colleague pointed out, this has ‘never hurt him’. To the contrary, he suggested that using pre-print services promotes one’s research: it allows the author to get the work into the community very early and to get feedback. And peers will always value good quality research and the value and recognition among colleagues will come back to the author and pay back eventually.

Additionally, someone from the audience suggested that publishing work in pre-print services provides a time-stamp for researchers and helps to ensure that ideas will not be scooped by anyone – researchers are free to share their research whenever they wish and as fast they wish.

Publishers should invest money in improving science – wishful thinking?

It was also proposed that instead of exploiting academics, publishers could play an important role in improving the research process. One participant proposed a couple of simple mechanisms that could be implemented by publishers to improve the quality of research data shared:

  • Employment of in-house data experts: bioinfomaticians or data scientists, who could judge whether supporting data is of a good enough quality
  • Ensure that there is at least one bioinfomatician/data scientist on the reviewing panel for a paper
  • Ask for the data to be deposited in a public, discipline-specific repository, which would ensure quality control of the data and adherence to data standards.
  • Ask for the source code and detailed methods to be made available as well.

Quick win: minimum requirements for making shared data useful

A requirement that, as a minimum, three key elements should be made available with publications – the raw data, the source code and the methods – seems to be a quick win solution to make research data more re-usable. Raw data is necessary as it allows users to check if the data is of a good quality overall, while publishing code is important to re-run the analysis and methods need to be detailed enough to allow other researchers to understand all the processes involved in data processing. An excellent case study example comes from Daniel MacArthur who has described how to reproduce all the figures in his paper and has shared the supporting code as well.

It was also suggested that the Office of Scholarly Communication could implement some simple quality control measures to ensure that research data supporting publications is shared. As a minimum the Office could check the following:

  • Is there a data statement in the publication?
  • If there is a statement – is there a link to the data?
  • Does the link work?

This is definitely a very useful suggestion from our research community and in fact we have already taken this feedback aboard and started checking for data citations in Cambridge publications.

Shortage of skills: effective data sharing is not easy

The discussion about the importance of data sharing led to reflections that effective data sharing is not always easy. A bioinformatician complained that datasets that she had tried to re-use did not satisfy the criteria of reproducibility, nor re-usability. Most of the time there was not enough metadata available to successfully use the data. There is some data shared, there is the publication, but the description is insufficient to understand the whole research process: the miracle, or the big discovery, happens somewhere in the middle.

Open Research in practice: training required

Attendees agreed that it requires effort and skills to make research open, re-usable and discoverable by others. More training is needed to ensure that researchers are equipped with skills to allow them to properly use the internet to disseminate their research, as well as with skills allowing them to effectively manage their research data. It is clear that discipline-specific training and guidance around how to manage research data effectively and how to practise open research is desired by Cambridge researchers.

Nudging researchers towards better data management practice

Many researchers have heard or experienced first-hand horror stories of having to follow up on somebody else’s project, where it was not possible to make any sense of the research data due to lack of documentation and processes. This leads to a lot of time wasted in every research group. Research data need to be properly documented and maintained to ensure research integrity and research continuity. One easy solution is to nudge researchers towards better research data management practice could be formalised data management requirements. Perhaps as a minimum, every researchers should have a lab book to document research procedures.

The time is now: stop hypocrisy

Finally, there was a suggestion that everyone should take the lead in encouraging Open Research. The simplest way to start is to stop being what has been described as a hypocrite and submit articles to journals which are fully Open Access. This should be accompanied by making one’s reviews openly available whenever possible. All publications should be accompanied by supporting research data and researchers should ensure that they evaluate individual research papers and that their judgement is not biased by the impact factor of the journal.

Need for greater awareness and interest in publishing

One of the Open Access advocates present at the meeting stated that most researchers are completely unaware of who are the exploitative and ethical publishers and the differences between them. Researchers typically do not directly pay the exploitative publishers and are therefore not interested in looking at the bigger picture of sustainability of scholarly publishing. This is clearly an area when more training and advocacy can help and the Office of Scholarly Communication is actively involved in raising awareness in Open Access. However, while it is nice to preach in a room of converts, how do we get other researchers involved in Open Access? How should we reach out to those who can’t be bothered to come to a discussion like the one we had? This is the area where anyone who understands the benefits Open Access has a job to do.

Next steps

We are extremely grateful to everyone who came to the event and shared their frustrations and ideas on how to solve some problems. We noted all the ideas on post it notes – the number of notes at the end of the discussion was impressive, an indication of how creative the participants were in just 90 minutes. It was a very productive meeting and we wish to thank all the participants for their time and effort.

20160608_160721

We think that by acting collaboratively and supporting good ideas we can achieve a lot. As an inspiration, McGill University’s Montreal Neurological Institute and Hospital (the Neuro) in Canada have recently adopted a policy on Open Research: over the next five years all results, publications and data will be free to access by everyone.

Follow up

If you would like to host similar discussions directly in your departments/institutes, please get in touch with us at info@osc.cam.ac.uk – we would be delighted to come over and hear from researchers in your discipline.

In the meantime, if you have any additional ideas that you wish to contribute, please send them to us. Everyone who is interested in being informed about the progress here is encouraged to sign up for a mailing distribution list here.

Extended notes from the meeting and slides are available at the Cambridge University Research Repository. We are particularly grateful to Avazeh Ghanbarian, Corina Logan, Ralitsa Madsen, Jenny Molloy, Ross Mounce and Alasdair Russell (listed alphabetically by surname) for agreeing to publicly speak at the event.

Published 3 August 2016
Written by Lauren Cadwallader, Joanna Jasiewicz and Marta Teperek
Creative Commons License