Tag Archives: research data management

From data curators to intellectual entrepreneurs: observations from IFLA

In this blog post, Clair Castle, Librarian, University of Cambridge, Department of Chemistry reflects on her experience at the IFLA Satellite Meeting 2017 in Warsaw, Poland.

Earlier this year I was invited by the Office of Scholarly Communication (OSC) at the University of Cambridge to present a paper on Data Curator’s Roles and Responsibilities: International and Interdisciplinary Perspectives. This was my first time writing a paper for a conference and presenting it; it was slightly daunting but exciting too!

IFLA is the International Federation of Library Associations and Institutions, the international body that represents the interests of library and information services and their users. It celebrates its 90th birthday in 2017. This conference was a pre-Congress Satellite Conference, taking place just before the IFLA World Library and Information Congress held in Wrocław, Poland, from 19–25 August.

There were three sessions of four presentations in the programme – which includes links to every presentation. You can find most of the papers that were presented here. The main conference hashtag on Twitter was #wlic2017 (learn more about the 2017 and upcoming 2018 congress by following @iflawic).

Conference focus

Data curation has emerged as a new area of responsibility for researchers, librarians, and information professionals in the digital environment. The huge variety and amount of data that needs to be processed, preserved, and disseminated is creating new roles, responsibilities and challenges for researchers and the library and information professionals who support them. The primary goal of the conference was to engage the international scholarly community in a conversation that led to a better understanding of these challenges, and to discuss the main trends in data curation and Research Data Management (RDM) practices and education.

To ‘curate’ means to ‘take care of’. What resonated with me the most from the conference was the fact that while we are curating data we are curating people as well. We are doing this by changing research culture, evolving the profession, changing research (and research support) practices, doing outreach and advocacy work, and liaising with related university support services. The conference presentations returned to this theme again and again.

I won’t discuss every presentation here, instead I will collate and relate the ideas that I found most thought-provoking.

Intellectual entrepreneurship

This term was introduced to me by Nitecki and Davis’ presentation ‘Expanding librarians’ roles in the research life cycle’. The definition I have since found that explains this the best is from Charles J. Chumas at Stony Brook University:

“Take … the textbook definition of entrepreneur: A person who organizes and manages any enterprise, especially a business, usually with considerable initiative and risk. Now, switch out the words “enterprise” and “business” with words such as “research” or “education”. This is the concept of intellectual entrepreneurship. It is the concept of taking risk, seizing opportunity, discovering and creating knowledge and employing one’s own innovation and strategies, with the ultimate goal of solving problems in corporate, societal or governmental environments. An intellectual entrepreneur … actively seeks out their own education … The philosophy of IE embodies four core values: vision and discovery, ownership and accountability, integrative thinking and action, and collaboration and teamwork”.

I feel that this describes the role of data curators exactly: researchers and the people supporting them are planning data curation strategically and innovatively, acquiring the necessary knowledge and skills to develop it in their institution, and working to bring systems, services and people together to achieve their overall goal of managing data effectively.

Zhang’s presentation ‘Data curators: A glimpse at their roles at the academic libraries in the United States’ mentioned the Association of Research Librarians’ Strategic Thinking and Design Initiative: Extended and Updated Report (2016) which estimates that the research librarian will have shifted from knowledge service provider to collaborative partner within the research ecosystem by 2033. In one example of this, librarians have shifted from providing a service support role to working with researchers to further open science: the FOSTER portal is an e-learning platform that brings together the best training resources addressed to those who need to know more about Open Science, or need to develop strategies and skills for implementing Open Science practices in their daily workflows. It provides training materials for many different users – from early-career researchers, to data managers, librarians, research administrators, and graduate schools. This reflects the self-education aspect of intellectual entrepreneurship.

Upskilling librarians

Many library science curricula around the world do not (yet) include an RDM module. Experienced librarians may not therefore have the necessary knowledge or skills to support RDM. Many data curation post advertisements require leadership, partnership, outreach and collaborative responsibilities but not a professional library qualification. Data curation posts have been repurposed from experienced librarian posts, taken up by new graduates, contractors, PhDs, or sometimes are joint appointments with different academic units. A review of the library profession with regard to RDM skills and knowledge is required to inform future education and training.

Peters’ presentation ‘Reskilling academic librarians for data management services’ highlighted Lewis’ research data management pyramid for libraries (p.16). Areas of early engagement with RDM are situated at the bottom of the pyramid, and as you get to the top you can take on the world!

Role of IT in data curation

Several speakers touched upon this: after all, IT underpins everything and IT support staff are often closer to researchers than librarians are. However, there may be a perception that data curation is not an IT role, per se. In another example of intellectual entrepreneurship, IT and data librarians can work together to provide research data support services: IT can bring UX (User Experience) skills e.g. design of systems, project management, and data librarians can bring their expertise in repository infrastructures, digital preservation, discovery and indexing methods for example.

The definition of data curation is evolving

The IFLA Library Theory and Research Panel Data Curation Project identified the role and responsibilities of data curators in international context. One aspect of the methodology was to undertake a review of literature and vocabulary describing data curation roles (using a cool keyphrase digger tool!), and analysing the content analysis of job advertisements (in 35 countries). They found varying terms to describe data curation (e.g. data stewardship, digital preservation, data science, and RDM, the preferred term). Outreach and advocacy to researchers was found to be an important aspect of roles, which again relates back to the theme of intellectual entrepreneurship.

Central vs. discipline-specific RDM activities at the University of Cambridge

As I have mentioned, I presented my paper on behalf of the OSC. Since its establishment in 2015 the OSC has developed many services to support RDM at the University, including a central website, RDM training and support, and a data repository. It communicates with researchers and support staff including librarians and administrators across the University using a variety of methods. There is therefore a considerable amount of outreach into departments and faculties where research takes place. However, its resources are limited: it is not possible for it to deliver RDM training for example in every department or faculty in the University, especially on a discipline-specific basis.

Most departments and faculties in the University have an embedded library service, which is discipline-specific. Librarians are in a key position to be able to collaborate with the OSC and their own researchers in developing and implementing RDM services locally. My paper presents a case study of how centralised RDM services have been rolled out in the Department of Chemistry, thus adapting the central RDM messages to discipline-specific needs. I describe how customising centralised RDM training to all new graduate students in the Department, being a member of the University’s RDM Project Group, and being involved in the OSC’s Data Champions programme has benefitted both the OSC and the Department.

Identity crisis?

The conference taught me that the identity of data curators is constantly evolving. Does it even matter what we call ourselves? Whatever the term used to describe us, we have similar roles and goals, and need to equip ourselves for future challenges. The concept of intellectual entrepreneurship is worth exploring further as a way of empowering ourselves.

The conference gave me a great opportunity to share and learn about RDM best practice from practitioners across the world. It reinforced for me the fact that we are all in it together, facing the same challenges and working together to come up with solutions.

Observations

The conference took place at the very impressive University of Warsaw Library, which is centrally located beside the Old Town in Warsaw, right next to the Vistula River. Around 40 delegates attended from all over the world.

Warsaw itself is a lively city, though with a rich, if at times tragic, history. After the conference dinner (a BBQ outside on a very warm evening!) we were treated to an entertaining evening bus tour around the city. We passed the amazing POLIN Museum of the History of Polish Jews, travelled through the area where the Warsaw Ghetto had been, and took in examples of communist era architecture (in particular the imposing Palace of Culture and Science).

        

Published 15 December 2017
Written by Clair Castle @chemlibcam
Creative Commons License

Engaging Researchers with Good Data Management: Perspectives from Engaged Individuals

We need to recognise good practice, engage researchers early in their career with research data management and use peers to talk to those who are not ‘onboard’. These were the messages five attendees at the Engaging Researchers in Good Data Management conference held on the 15th of November.

The Data Champions and Research Support Ambassadors programmes are designed to increase confidence in providing support to researchers in issues around data management and all of scholarly communications respectively. Thanks to the generous support of the Arcadia Foundation, five places were made available to attend this event. In this blog post the three Data Champions and two Research Support Ambassadors who were awarded the places give us the low-down on what they got out of the conference and how they might put what they heard into practise.

Recordings of the talks from the event can be found on the Cambridge University Library YouTube channel.

Financial recognition is the key

Dr Laurent Gatto, Senior Research Associate, Department of Biochemistry, University of Cambridge and Data Champion

As a researcher who cherishes good and reproducible data analysis, I naturally view good data management as essential. I have been involved in research data management activities for a long time, acting as a local data champion and participating in open research and open data events. I was interested in participating in this conference because it gathered data champions, stewards and alike from various British and European institutions (Cambridge, Lancaster, Delft), and I was curious to see what approaches were implemented and issues were addressed across institutions. Another aspect of data championship/stewardship I am interested in is the recognition these efforts offer (this post touches on this a bit).

Focusing on the presentations from Lancaster, Cambridge and Delft, it is clear that direct engagement from active researchers is essential to promote healthy data management. There needs to be an enthusiastic researcher, or somebody that has some experience in research, to engage with the research community about open data, reproducibility, transparency, security; a blunt top-down approach lead to limited engagement. This is also important due to the plurality of what researchers across disciplines consider to be data. An informal setting, ideally driven by researchers and, or in collaboration with librarians, focusing on conversations, use-cases, interviews, … (I am just quoting some successful activities cited during the conference) have been the most successful, and have sometime also lead to new collaborations.

Despite the apparent relative success of these various data championing efforts and the support that the data champions get from their local libraries, these activities remain voluntary and come with little academic reward. Being a data champion is certainly an enriching activity for young researchers that value data, but is comes with relatively little credit and without any reward or recognition, suggesting that there is probably room for a professional approach to data stewardship.

With this in mind, I was very interested to hear the approach that is currently in place at TU Delft, where data stewards hold a joint position at the Centre for Research Data and at their respective faculty. This defines research data stewardship as an established and official activity, allows the stewards to pursue a research activity, and, explicitly, links research data to research and researchers.

I am wondering if this would be implemented more broadly to provide financial recognition to data stewards/champions, offer incentives (in particular for early-career researchers) to approach research data management professionally and seriously, make data management a more explicit activity that is part of research itself, and move towards a professionalisation of data management posts.

Inspiration and ideas

Angela Talbot, Research Governance Officer, MRC Biostatistics Unit and Data Champion

Tasked with improving and updating best practice in the MRC Biostatistics Unit, I went along to this workshop not really knowing what to expect but hopeful and eager to learn.

Good data management can meet with resistance as while it’s viewed as an altruistic and noble thing to do many researchers worry that to make their research open and reproducible opens them to criticism and the theft of ideas and future plans. What I wanted to know are ways to overcome this.

And boy did this workshop live up to my expectations! From the insightful opening comments to the though provoking closing remarks I was hooked. All of the audience were engaged in a common purpose, to share their successes and strategies for overcoming the barriers that ensure this becomes best practice.

Three successful schemes were talked through: the data conversations in Lancaster, the Data Champion scheme at the University of Cambridge and the data stewards in TU Delft. All of these successful schemes had one thing in common: they all combine a cross department/ faculty approach with local expertise.

Further excellent examples were provided by the lightning talks and for me, it was certainly helpful to hear of successes in engaging researchers on a departmental level.

The highlight for me were the focus groups – I was involved in Laurent Gatto’s group discussing how to encourage more good data management by highlighting what was in to for researchers who participate but I really wish I could have been in them all as the feedback indicated they had given useful insights and tips.

All in all I came away from the day buzzing with ideas. I spent the next morning jotting down ideas of events and schemes that could work within my own unique department and eager to share what I had learnt. Who knows, maybe next time I’ll be up there sharing my successes!!

We need to speak to the non-converted

Dr Stephen Eglen, Reader in Computational Neuroscience, Department of Applied Mathematics & Theoretical Physics, University of Cambridge and Data Champion

The one-day meeting on Engaging Researchers in Good Data Management served as a good chance to remind all of us about the benefits, but also the responsibilities we have to manage, and share, data. On the positive side, I was impressed to see the diversity of approaches lead by groups around the UK and beyond. It is heartening to see many universities now with teams to help manage and share data.

However, and more critically, I am concerned that meetings like this tend to focus on showcasing good examples to an audience that is already mostly convinced of the benefits of sharing. Although it is important to build the community and make new contacts with like-minded souls, I think we need to spend as much time engaging with the wider academic community.   In particular, it is only when our efforts can be aligned with those of funding agencies and scholarly publishing that we can start to build a system that will give due credit to those who do a good job of managing, and then sharing, their data. I look forward to future meetings where we can have a broader engagement of data managers, researchers, funders and publishers.

I am grateful to the organisers to have given me the opportunity to speak about our code review pilot in Neuroscience. I particularly enjoyed the questions. Perhaps the most intriguing question to report came in the break when Dr Petra ten Hoopen asked me what happens if during code review a mistake is found that invalidates the findings in the paper? To which I answered (a) the code review is supposed to verify that the code can regenerate a particular finding; (b) that this is an interesting question and it would probably depend on the severity of the problem unearthed; (c) we will cross that bridge when we come to it. Dr ten Hoopen noted that this was similar to finding errors in data that were being published alongside papers. These are indeed difficult questions, but I hope in the relatively early days of data and code sharing, we err on the side of rewarding researchers who share.

Teach RDM early and often

Kirsten Elliott, Library Assistant, Sidney Sussex College, University of Cambridge and Research Support Ambassador

Prior to this conference, my experience with Research Data Management (RDM) was limited to some training through the Office of Scholarly Communication and Research Support Ambassadors programme. This however really sparked my interest and so I leapt at the opportunity to learn more about RDM by attending this event. Although at times I felt slightly out of my depth, it was fascinating to be surrounded by such experts on the topic.

The introductory remarks from Nicole Janz were a fascinating overview of the reproducibility crisis, and how this relates to RDM, including strategies for what could be done, for example setting reproducing studies as assignments when teaching statistics. This clarified for me the relationship between RDM and open data, and transparency in research.

There were many examples throughout the day of best practice in promoting good RDM, from the “Data Conversations” held at Lancaster University, international efforts from SPARC Europe and even some from Cambridge itself! Common ground across all of them included the necessity of utilising engaged researchers themselves to spread messages to other researchers, the importance of understanding discipline specific issues with data, and an expansive conception of what counts as “data”.

I am based in a college library and predominantly work supporting undergraduate students, particularly first years. In a way this makes it quite a challenge to present RDM practices as many of the issues are most obviously relevant to those undertaking research. However, I think there’s a strong argument for teaching about RDM from very early in the academic career to ingrain good habits, and I will be thinking about how to incorporate RDM into our information literacy training, and signposting students to existing RDM projects in Cambridge.

Use peers to spread the RDM message

Laura Jeffrey, Information Skills Librarian, Wolfson College, University of Cambridge and Research Support Ambassador

This inspirational conference was organised and presented by people who are passionate about communicating the value of open data and replicability in research processes. It was valuable to hear from a number of speakers (including Rosie Higman from the University of Manchester, Marta Busse-Wicher from the University of Cambridge and Marta Teperek from TU Delft) about the changing role of support staff, away from delivering training to one of coordination. Peers are seen to be far more effective in encouraging deeper engagement, communicating personal rather than prescriptive messages (evidenced by Data Conversations at Lancaster University). A member of the audience commented that where attendance is low for their courses, the institution creates video of researcher-led activities to be delivered at point of need.

I was struck by two key areas of activity that I could act on with immediate effect:

Inclusivity – Beth Montagu Hellen (Bishop Grosseteste) highlighted the pressing need for open data to be made relevant to all disciplines. Cambridge promotes a deliberately broad definition of data for this reason. Yet more could be done to facilitate this; I’ll be following @OpenHumSocSci to monitor developments. We’re fortunate to have a Data Science Group at Wolfson promoting examples of best practice. However, I’m keen to meet with them to discuss how their activities and the language they use could be made more attractive to all disciplines.

Communication – Significant evidence was presented by Nicole Janz, Stephen Eglen and others, that persuading researchers of the benefits of open data leads to higher levels of engagement than compulsion on the grounds of funder requirements. This will have a direct impact on the tone and content of our support. A complimentary approach was proposed: targeted campaigns to coincide with international events in conjunction with frequent, small-scale messages. We’ll be tapping into Love Data Week in 2018 with more regular exposure in email communication and @WolfsonLibrary.

As result of attending this conference, I’ll be blogging about open data on the Wolfson Information Skills blog and providing pointers to resources on our college LibGuide. I’ll also be working closely with colleagues across the college to timetable face-to-face training sessions.

Published 15 December 2017
Written by Dr Laurent Gatto, Angela Talbot, Dr Stephen Eglen, Kirsten Elliott and Laura Jeffrey
Creative Commons License

Benchmarking RDM Training

This blog reports on the progress of the international project to benchmark Research Data Management training across institutions. It is a collaboration of Cambridge Research Data Facility staff with international colleagues – a full list is at the bottom of the post. This is a reblog, the original appeared on 6 October 2017. 

How effective is your RDM training?

When developing new training programmes, one often asks oneself a question about the quality of training. Is it good? How good is it? Trainers often develop feedback questionnaires and ask participants to evaluate their training. However, feedback gathered from participants attending courses does not answer the question how good was this training compared with other training on similar topics available elsewhere. As a result, improvement and innovation becomes difficult. So how to objectively assess the quality of training?

In this blog post we describe how, by working collaboratively, we created tools for objective assessment of RDM training quality.

Crowdsourcing

In order to objectively assess something, objective measures need to exist. Being unaware of any objective measures for benchmarking of a training programme, we asked Jisc’s Research Data Management mailing list for help. It turned out that a lot of resources with useful advice and guidance on creation of informative feedback forms was readily available, and we gathered all information received in a single document. However, none of the answers received provided us with the information we were looking for. To the contrary, several people said they would be interested in such metrics. This meant that objective metrics to address the quality of RDM training either did not exist, or the community was not aware of them. Therefore, we decided to create RDM training evaluation metrics.

Cross-institutional and cross-national collaboration

For metrics to be objective, and to allow benchmarking and comparisons of various RDM courses, they need to be developed collaboratively by a community who would be willing to use them. Therefore, the next question we asked Jisc’s Research Data Management mailing list was whether people would be willing to work together to develop and agree on a joint set of RDM training assessment metrics and a system, which would allow cross-comparisons and training improvements. Thankfully, the RDM community tends to be very collaborative, which was the case also this time – more than 40 people were willing to take part in this exercise and a dedicated mailing list was created to facilitate collaborative working.

Agreeing on the objectives

To ensure effective working, we first needed to agree on common goals and objectives. We agreed that the purpose of creating the minimal set of questions for benchmarking is to identify what works best for RDM training. We worked with the idea that this was for ‘basic’ face-to-face RDM training for researchers or support staff but it can be extended to other types and formats of training session. We reasoned that same set of questions used in feedback forms across institutions, combined with sharing of training materials and contextual information about sessions, should facilitate exchange of good practice and ideas. As an end result, this should allow constant improvement and innovation in RDM training. We therefore had joint objectives, but how to achieve this in practice?

Methodology

Deciding on common questions to be asked in RDM training feedback forms

In order to establish joint metrics, we first had to decide on a joint set of questions that we would all agree to use in our participant feedback forms. To do this we organised a joint catch up call during which we discussed the various questions we were asking in our feedback forms and why we thought these were important and should be mandatory in the agreed metrics. There was lots of good ideas and valuable suggestions. However, by the end of the call and after eliminating all the non-mandatory questions, we ended up with a list of thirteen questions, which we thought were all important. These however were too many to be asked of participants to fill in, especially as many institutions would need to add their own institution-specific feedback questions.

In order to bring down the number of questions which should be made mandatory in feedback forms, a short survey was created and sent to all collaborators, asking respondents to judge how important each question was (scale 1-5, 1 being ‘not important at all that this question is mandatory’ and 5 being ‘this should definitely be mandatory’.). Twenty people participated in the survey. The total score received from all respondents for each question were calculated. Subsequently, top six questions with the highest scores were selected to be made mandatory.

Ways of sharing responses and training materials

We next had to decide on the way in which we would share feedback responses from our courses and training materials themselves . We unanimously decided that Open Science Framework (OSF) supports the goals of openness, transparency and sharing, allows collaborative working and therefore is a good place to go. We therefore created a dedicated space for the project on the OSF, with separate components with the joint resources developed, a component for sharing training materials and a component for sharing anonymised feedback responses.

Next steps

With the benchmarking questions agreed and with the space created for sharing anonymised feedback and training materials, we were ready to start collecting first feedback for the collective training assessment. We also thought that this was also a good opportunity to re-iterate our short-, mid- and long-term goals.

Short-term goals

Our short-term goal is to revise our existing training materials to incorporate the agreed feedback questions into RDM training courses starting in the Autumn 2017. This would allow us to obtain the first comparative metrics at the beginning of 2018 and would allow us to evaluate if our designed methodology and tools are working and if they are fit for purpose. This would also allow us to iterate over our materials and methods as needed.

Mid-term goals

Our mid-term goal is to see if the metrics, combined with shared training materials, could allow us to identify parts of RDM training that work best and to collectively improve the quality of our training as a whole. This should be possible in mid/late-2018, allowing time to adapt training materials as result of comparative feedback gathered at the beginning of 2018 and assessing whether training adaptation resulted in better participant feedback.

Long-term goals

Our long-term goal is to collaboratively investigate and develop metrics which could allow us to measure and monitor long-term effects of our training. Feedback forms and satisfaction surveys immediately after training are useful and help to assess the overall quality of sessions delivered. However, the ultimate goal of any RDM training should be the improvement of researchers’ day to day RDM practice. Is our training really having any effects on this? In order to assess this, different kinds of metrics are needed, which would need to be coupled with long-term follow up with participants. We decided that any ideas developed on how to best address this will be also gathered in the OSF and we have created a dedicated space for the work in progress.

Reflections

When reflecting on the work we did together, we all agreed that we were quite efficient. We started in June 2017, and it took us two joint catch up calls and a couple of email exchanges to develop and agree on joint metrics for assessment of RDM training. Time will show whether the resources we create will help us meet our goals, but we all thought that during the process we have already learned a lot from each other by sharing good practice and experience. Collaboration turned out to be an excellent solution for us. Likewise, our discussions are open to everyone to join, so if you are reading this blog post and would like to collaborate with us (or to follow our conversations), simply sign up to the mailing list.

Resources

Published 9 October 2017
Written by: (in alphabetical order by surname): Cadwallader Lauren, Higman Rosie, Lawler Heather, Neish Peter, Peters Wayne, Schwamm Hardy, Teperek Marta, Verbakel Ellen, Williamson, Laurian, Busse-Wicher Marta
Creative Commons License