Tag Archives: data

Methods getting their chance to shine – Apollo wants your methods!

November 7, 2023Supporting Open Researchdata, methodsadmin

By Dr. Kim Clugston, Research Data Co-ordinator, Office of Scholarly Communication

Underlying all research data is always an effective and working method and this applies across all disciplines from STEMM to the Arts, Humanities and Social Sciences. Methods are a detailed description of the tools that are used in research and can come in many forms depending on the type of research. Methods are often overlooked rather than being seen as an integral research output in their own right. Traditionally, published journals include a materials and methods section, which is often a summary due to restrictions on word limits making it difficult for other researchers to reproduce the results or replicate the study. There can sometimes be an option to submit the method as “supplementary material”, but this is not always the case. There are specific journals that publish methods and may be peer-reviewed but not all are open access, rendering them hidden behind a paywall. The last decade has seen the creation of “protocol” repositories, some with the ability to comment, adapt and even insert videos. Researchers at the University of Cambridge, from all disciplines – arts, humanities, social sciences and STEMM fields – can now publish their method openly in Apollo, our institutional repository. In this blog, we discuss why it is important to publish methods openly and how the University’s researchers and students can do this in Apollo.

The protocol sharing repository, Protocols.io, was founded in 2012. Protocols can be uploaded to the platform or created within it; they can be shared privately with others or made public. The protocols can be dynamic and interactive (rather than a static document) and can be annotated, which is ideal for highlighting information that could be key to an experiment’s success. Collaboration, adaptation and reuse are possible by creating a fork (an editable clone of a version) that can be compared with any existing versions of the same protocol. Protocols.io currently hosts nearly 16,000 public protocols, showing that there is a support for this type of platform. In July this year it was announced that Protocols.io was acquired by Springer Nature. Their press statement aims to reassure that Protocols.io mission and vision will not change with the acquisition, despite Springer Nature already hosting the world’s largest collection of published protocols in the form of SpringerProtocols along with their own version of a free and open repository, Protocol Exchange. This begs the question of whether a major commercial publisher is monopolising the protocol space, and if they are, is this or will this be a problem? At the moment there do not appear to be any restrictions on exporting/transferring protocols from Protocols.io and hopefully this will continue. This is a problem often faced by researchers using proprietary Electronic Research Notebooks (ERNs), where it can be difficult to disengage from one platform and laborious to transfer notebooks to another, all while ensuring that data integrity is maintained. Because of this, researchers may feel locked into using a particular product. Time will tell how the partnership between Protocols.io and Springer Nature develops and whether the original mission and vision of Protocols.io will remain. Currently, their Open Research plan enables researchers to make an unlimited number of protocols public, with the number of private protocols limited to two (paid plans offer more options and features).

Bio-protocol exchange (under the umbrella of Bio-protocol Journal) is a platform for researchers to find, share and discuss life science protocols with protocol search and webinars. Protocols can be submitted either to Bio-protocol or as a preprint, researchers can ask authors questions, and fork to modify and share the protocol while crediting the original author. They also have an interesting ‘Request a Protocol’ (RaP) service that searches more than 6 million published research papers for protocols or allows you to request one if you are unable to find what you are looking for. A useful feature is that you can ask the community or the original authors of the protocol any question you may have about the protocol. Bio-protocol exchange published all protocols free of charge to their authors since their launch in 2011, with substantial financial backing of their founders. Unfortunately, it was announced that protocol articles submitted to Bio-protocol after March 1 2023 will be charged an Article Processing Charge (APC) of $1200. Researchers who do not want to pay the APC can still post a protocol for free in the Bio-protocol Preprint Repository where they will receive a DOI but will not have gone through the journal’s peer review process.

As methods are integral to successful research, it is a positive move to see the creation and growth of platforms supporting protocol development and sharing. Currently, these tend to cater for research in the sciences, and serve the important role of supporting research reproducibility. Yet, methods exist across all disciplines – arts, humanities, social sciences as well as STEMM – and we see the term ‘method’ rather than ‘protocol’ as more inclusive of all areas of research.

Apollo (Cambridge University’s repository) has now joined the growing appreciation within the research community of recognising the importance of detailing and sharing methodologies. Researchers at the University can now use their Symplectic Elements account to deposit a method into Apollo. Not only does this value the method as an output in its own right, it provides the researcher with a DOI and a publication that can be automatically updated to their ORCID profile (if ORCID is linked to their Elements account). In May this year, Apollo was awarded CoreTrustSeal certification, reinforcing the University’s commitment to preserving research outputs in the long-term and should give researchers confidence that they are depositing their work in a trustworthy digital repository.

The first method to be deposited into Apollo in this way was authored by Professor John Suckling and colleagues. Professor Suckling is Director of Research in Psychiatric Neuroimaging in the Department of Psychiatry. His published method relates to an interesting project combining art and science to create artwork that aims to represent hallucinatory experiences in individuals with diagnosed psychotic or neurodegenerative disorders. He is no stranger to depositing in Apollo; in fact, he has one of the most downloaded datasets in Apollo after depositing the Mammographic Image Analysis Society database in Apollo in 2015. This record contains the images of 322 digital mammograms from a database complied in 1992. Professor Suckling is an advocate of open research and was a speaker at the Open Research at Cambridge conference in 2021.

An interesting and exciting new platform which aims to change research culture and the way researchers are recognised is Octopus. Founded by University of Cambridge researcher Dr Alexandra Freeman, Octopus is free to use for all and is funded by UKRI and developed by Jisc. Researchers can publish instantly all research outputs without word limit constraints, which can often stifle the details. Research outputs are not restricted to articles but also include, for example, code, methods, data, videos and even ideas or short pieces of work. This serves to incentivise the importance of all research outputs. Octopus aims to level up the current skew toward publishing more sensationalist work and encourages publishing all work, such as negative findings, which are often of equal value to science but often get shelved in what is termed the ‘file drawer’ problem. A collaborative research community is encouraged to work together on pieces of a puzzle, with credit given to individual researchers rather than a long list of authors. The platform supports reproducibility, transparency, accountability and aims to allow research the best chance to advance more quickly. Through Octopus, authors retain copyright and apply a Creative Commons licence to their work; the only requirement is that published work is open access and allows derivatives. It is a breath of fresh air in the current rigid publishing structure.

Clear and transparent methods underpin research and are fundamental to the reliability, integrity and advancement of research. Is the research landscape beginning to change to allow open methods, freely published, to take centre stage and for methods to be duly recognised and rewarded as a standalone research output? We certainly hope so. The University of Cambridge is committed to supporting open research, and past and present members who have conducted research at the University can share these outputs openly in Apollo. If you would like to publish a method in Apollo, please submit it here or if you have any queries email us at info@data.cam.ac.uk.

There will be an Octopus workshop at the Open Research for Inclusion: Spotlighting Different Voices in Open Research at Cambridge on Friday 17^th November 2023 at Downing College.

The Data Picture

October 18, 2023Library and training matters, UncategorizeddataNiamh Malin

I was recently named one of “the next generation of [library] leaders” as part of the CILIP 125, having been recognised as an individual who contributes energy and knowledge to improving and impacting their organisation. My area of expertise, and thus recognition, lies with the use of data within libraries. As a data analyst for the Office of Scholarly Communications at Cambridge University Library, my role focuses on empowering decisions with data driven understanding – such as supporting the Springer Nature negotiations. To develop my understanding of data, and its role within a wider organisation, further, I engage with data beyond the library – such as the Big Data London conference and the Carruthers and Jackson Data Leaders’ Summer School. Reflecting on the use of data in the wider world, what can be expected of the library and data?

The summer school provided practical advice, proven methodologies, and guidance that could apply across a variety of businesses. The course is designed to provide insight on the workflow of data officers, and their role within an organisation – no matter its stage of data maturity and literacy. Over the course of the ten weeks, leading experts discussed the role of a chief data officer (CDO), both as a business development opportunity, and as a career path for individuals. It explored the risk and governance of data within an organisation, and the final weeks focused strongly on the role of people and teams associated with data.

Peter Jackson and Caroline Carruthers addressed the differing types of CDO and described a pendulum between ‘risk aversion’ and ‘value added’. Understanding the balance between secure and proper data governance (GDPR for example) and providing value through data (such as setting up automation). The pendulum of risk to reward is relevant to many roles, including those within the library. Understanding the need to divide time and energy between creating policies and getting decision making results, is just as relevant to my role as a chief data officer. In my role I have supported decision making staff through data production, but equally, to instil a culture of data, time and energy must be dedicated to risk aversion, through tasks of researching data management, preparing training sessions for data storage, and supporting staff in data preparation.

Another important concept introduced was the DIKW pyramid – Data, Information, Knowledge, Wisdom – for understanding the value created from data. The base of the pyramid is (raw) Data, which can be processed into (useful) Information. This Information is data with meaning and a purpose and can be organised into (insightful) knowledge. Knowledge combines experiences, values, insights, and contextual information, which can then transcend to (integral) Wisdom. Wisdom is considered a deeper understanding with ethical implications and the ability to define ‘why’. The DIKW pyramid provided a frame of thought for presenting and approaching future data projects. Understanding the requirement to provide, data, information or knowledge, to better support a decision-making team.

To develop communication skills, expert Scott Taylor, known as The Data Whisperer, spoke about the three V’s for data storytelling: Vocabulary, Voice and Vision. Combining an accessible vocabulary, with a common voice will illuminate the business vision, and why that is important. This overarching concept for an organisations data approach can be scaled down to support individual data workers, to provide value – which should either grow, improve or protect the business case. Understanding how to communicate the data is a key skill as “Hardware comes and goes, software comes and goes, but data remains”. And that data that remains should be used to either grow, improve or protect the business, such that data gathered should be usable data!

At Big Data London, the organisation Women in Data hosted conversations about nurturing a culture of learning within data teams. Pulling from their experiences from minority backgrounds, the speakers highlighted the power in upskilling, sharing skills across teams and being an advocate on oneself and skills. As for what to upskill, data literacy was a hot topic across the conference. Data literacy, also called data fluency and data confidence, is the combination of ability, skills and confidence surround data and its uses. Data literacy enables more efficient work, and begs the question, what is the base level of data literacy / confidence across the library? Librarians use data daily; checking in/out material, answering students’ queries, or tracking the use of space, but are all librarians confident to use that data? This is an area I hope to explore further at the CUL, to ensure staff can use the data they have to support decisions.

Engaging with the world of data provides a big picture of the possibilities within the library. Conversations of AI (Artificial Intelligence), data policies and maturity, and shiny-new databases, software, and services, demonstrate the growing adoption of data, and therefore, libraries should follow suit. Actively taking snippets of larger conversations, developing ideas within the library space, and exploring the possibilities with data will help libraries thrive in this world of technological growth.

Lessons learned from Jisc Research Data Champions

October 2, 2018Uncategorizeddata, data champions, open accessOffice of Scholarly Communication

In 2017 four Cambridge researchers received grants from Jisc to develop and share their research data management practices. In this blog, the four awardees each highlight one aspect of their work as a Jisc Data Champion.

The project

All four Champions embarked on a range of activities throughout the year including creating local communities interested in RDM practices, delivering training, running surveys to understand their department better, creating ‘how-to’ guides for would-be RDM mentors and testing Samvera as part of RDSS. They were excited by the freedom that the grant gave them to try out whatever RDM related activities they wanted, which meant they could develop their skills and see ideas come to fruition and make them reusable for others. For example, Annemarie Eckes developed a questionnaire on RDM practices for PhD students and Sergio Martínez Cuesta has posted his training courses on GitHub.

However, throughout the duration of the award they also found some aspects of championing good RDM disconcerting. Whilst some sessions proved popular, others had very low attendee figures, even when a previous iteration of the session was well attended. They all shared the sense of frustration often felt by central RDM services that it is getting people to initially engage and turn up to a session that is the hard part. However, when people did come they found the sessions very useful, particularly because the Champions were able to tailor it specifically to the audience and discipline and the similar background of all the attendees provided an extra opportunity for exchanging advice and ideas that were most relevant.

The Champions tried out many different things. The Jisc Research Data Champions were expected to document and publicise their research data management (RDM) experiences and practices and contribute to the Jisc Research Data Shared Service (RDSS) development. Here the Champions each highlight one thing they tried out, which we hope will help others with their RDM engagement.

BYOD (Bring your own data)

Champion: Annemarie Eckes, PhD student, Department of Geography

The “Bring your own data” workshop was intended for anyone who thought their project data needed sorting, they needed better documentation, or even they needed to find out who is in charge or the owner of certain data. I set it up to give attendees time and space to do any kind of data-management related tasks: clean up their data, tidy up their computer/ email inbox, etc. The workshop was, really, for everyone whether at the start of their project and at the planning stage or in the middle of a project and had neglected their data management to some extent.

For the workshop the participants needed a laptop or login for the local computers to access their data and a project to tidy up or prepare, that can be done within two hours. I provided examples of file naming conventions and folder structures as well as instructions on how to write good READMEs (messages to your future self) and a data audit framework to give participants some structure to their organisation. After a brief introductory presentation about the aims and the example materials I provided, people would spend the rest of the time tidying up their data or in discussions with the other participants.

While this was an opportunity for the participants to sit down and sort out their digital files, I also wanted participants to talk to each other about their data organisation issues and data exchange solutions. Once I got everyone talking, we soon discovered that we have similar issues and were able to exchange information on very specific solutions.

1-on-1 RDM Mentoring

Champion: Andrew Thwaites, postdoc, Department of Psychology

I decided to trial 1-on-1 RDM mentoring as a way to customise RDM support for individual researchers in my department. The aim was that by the end of the 1-on-1 session, the mentee should understand how to a) share their data appropriately at the end of their project, and b) improve on their day-to-day research data management practice.

Before the meeting, I encouraged the mentee to compile a list of funders, and their funder’s data sharing requirements. During the meeting, the mentee and I would make a list of the data in the mentees project that they are aiming to share, and then I would then help them to choose a repository (or multiple repositories) to share this data on, and I’d also assist in designing the supporting documentation to accompany it. During the sessions I also had conversations about about GDPR, anonymising data, internal documentation and day-to-day practices (file naming conventions, file backups etc.) with the mentee.

As far as possible, I provided non-prescriptive advice, with the aim being to help the mentee make an informed decision, rather than forcing them into doing what I thought was best.

Embedding RDM

Champion: Sergio Martinez Cuesta, research associate, CRUK-CI and Department of Chemistry

I came to realise early in the Jisc project that stand-alone training sessions focused exclusively on RDM concepts were not successful as students and researchers found them too abstract, uninteresting or detached from their day-to-day research or learning activities. I think the aerial view of the concept of 1-on-1 mentoring and BYOD sessions is beautiful. However, in my opinion, both strategies may face challenges with necessary numbers of mentors/trainers increasing unsustainably as the amount of researchers needing assistance grows and the research background of the audience becomes more diverse.

To facilitate take-up, I tapped into the University’s lists of oversubscribed computational courses and found that many researchers and students already shared interests in learning programming languages, data analysis skills and visualisation in Python and R. I explored how best to modify some of the already-available courses with an aim of extending the offer after having added some RDM concepts to them. The new courses were prepared and delivered during 2017-2018. Some of the observations I made were:

Learning programming naturally begs for proper data management as research datasets and tables need to be constantly accessed and newly created. It was helpful to embed RDM concepts (e.g. appropriate file naming and directory structure) just before showing students how to open files within a programming language.
The training of version control using git required separate sessions. Here students and researchers also discover how to use GitHub, which later helps them to make their code and analyses more reproducible, create their own personal research websites …
Gaining confidence in programming, structuring data / directories and version control in general helps students to acknowledge that research is more robust when open and contrasted by other researchers. Learning how researchers can identify themselves in a connected world with initiatives such as ORCID was also useful.

Brown Bag Lunch Seminar Series: The Productive Researcher

Champion: Melissa Scarpate, postdoc, Faculty of Education

I created the Productive Researcher seminar series to provide data management and Open Access information and resources to researchers at the Faculty of Education (FoE). The aim of the brown bag lunch format was to create an informal session where questions, answers and time for discussion could be incorporated. I structured the seminars so they covered 1) a presentation and discussion of data management and storage; 2) a presentation about Open Access journals and writing publications; 3) a presentation on grant writing where Open Access was highlighted.

While the format of the series was designed to increase attendance, the average was four attendees per session. The majority of attendees were doctoral students and postdocs who had a keen interest in properly managing their data for their theses or projects. However, I suspect it may be the case that those attending already understood data management processes and resources.

In conclusion, I think that whilst the individuals that attended these seminars found the content helpful (per their feedback) the impact of the seminars was extremely limited. Therefore, my recommendation would be to have all doctoral students take a mandatory training class on data management and Open Access topics as part of their methodological training. Furthermore, I think it may be most helpful in reaching postdocs and more senior researchers to have a mandated data management meetings with a data manager to discuss their data management and Open Access plans prior to submitting any grant proposals. Due to new laws and policies on data (GDPR) this seems a necessary step to ensure compliance and excellence in research.

Published 2 October 2018
Compiled and edited by Dr Lauren Cadwallader from contributions by Annemarie Eckes, Dr Andrew Thwaites, Dr Sergio Martinez Cuesta, Dr Melissa Scarpate