Tag Archives: open data

In conversation with Michael Ball from BBSRC

The Biotechnical and Biological Sciences Research Council (BBSRC) Data Sharing Policy states that research data that supports publications must be stored for 10 years and adherence to data management plans will be monitored and built into the Final Report score, which may be taken into account for future proposals.

Recently Michael Ball, the Strategy and Policy Manager at BBSRC accepted an invitation to Cambridge University to discuss the BBSRC policy on opening up access to data. Senior members of the University, the School of Biological Sciences, the Research Office and the Office of Scholarly Communications attended. These notes have been verified by Michael as an accurate reflection of the discussion.

The take home messages from the meeting were the importance of:

  • Disciplines themselves establishing ways of dealing with data
  • Thinking about how to deal with data from the beginning of a research project

The meeting began with a discussion about the support we provide Cambridge University researchers through the Research Data Service , the resources provided on the data website and the enthusiastic uptake of the service since the beginning of the year.

The conversation then moved into issues around the policy, focusing on several aspects – clarification of what needs to be shared, how this will be supported financially, questions about auditing, a discussion about the best place to keep the data and issues with data sharing in the biological sciences.

What data are we expected to share?

What is ‘supporting data’ in the biological sciences?

One of the biggest concerns biological researchers have about data sharing is what is meant by ‘data’. Biology has the most diverse group of data, which makes it hard to talk about biology because the issues are project and problem specific.

Michael confirmed the policy broadly refers to all data ‘but the devil is in the detail, there are lots of caveats’.  He echoed Ben Ryan in answer to a similar question of the EPSRC policy by saying the key points are:

  • What would you expect to see?
  • What do you think is important?

The interpretation of the BBSRC policy depends heavily on the types of data being produced.  Much is dependent on the expected norms, what a researcher would expect to see if they were trying to interpret the paper. What are the underlying supporting data for the paper?

The biological sciences throw up a particular challenge in the range and disparity in disciplinary norms. For example a great deal of data arises from genomics and some time ago they made the decision to share, including making decisions about what to share and what not to share. However, there are vast areas of experimental science where the paper itself is data.

The policy is going one step further back from the published paper towards the lab. In the future these data policies might go further back, if there was greater automation of the research process.

Michael confirmed that if the BBSRC has funded a PhD student they would expect them to make supporting data available.

What do we need to share in the Biological Sciences?

There is no expectation to share lab books unless they are the only place the data exists. Michael noted that when the BBSRC wrote the policy it excluded lab books and organisms.

However there is an expectation to share instrumental output. This is with the caveat that if it is output from an instrument that goes through some sort of amendment then you don’t need to share the original.

An example: A researcher is counting bacteria on a plate and scrupulously making notes in lab books before entering this information put into a computer spreadsheet to crunch the numbers. The expectation would be to share the spreadsheet not the lab book.

Some research requires the construction of a piece of technology where there might not be a great deal of associated data around it. In these instances it is the process of construction or the protocol or the methodology that is important to share.

Michael noted that in some disciplines, given the materials and input parameters and the same instruments, the output data will be the same each time. In these circumstances it is most sensible to share or describe the inputs and repeat the experiments. The question is about what would be the most useful to share.

Show me the money

A stitch in time

Michael confirmed that researchers can ask for the money they need (and can justify) for research data management in grant applications. He did say however that the BBSCR does not ‘generally see a lot of these requests’. He noted that this is because often people haven’t thought about the data they will generate at the start of the project. One of the researchers pointed out it was difficult to know how to fund it because ‘we are not sure what we need’. However, this should not be a reason to ask for nothing.

It may be that some of the discipline specific repositories will have to change their business models in the future to cope with larger data sets.

Michael said that it is worth thinking about data sharing at the project planning stage because different types of data have different requirements. Researchers might need to allow for the cost of getting the data in the right format and metadata. It is advisable to think about where the data will be published so the research team can prepare the data in the first instance.

Michael said that the data management plan should hopefully prompt how much data a research project will produce. It is advisable to consider the maximum amount of data the project may produce. The ideal situation will be to have an ongoing data management plan because in some ways it is useful at the end.

Longer term financial support

Raised in the meeting was the option of charging a flat fee up front regardless of the data being generated. The question arose about whether there was any danger in auditing with this approach? The problem with an up front fee is it becomes more difficult to track and output from a specific grant against what we put into the database. There is a directly incurred and directly allocated component to the cost.

Michael confirmed that any money allocated to data management won’t survive past the end of the grant. He noted this was something that he was ‘not sure how to unpick’. This raises the issue of the cost of longer term data sharing. The BBSRC provides funding to a certain point in time. There can be a secondary experiment funded by someone else and the works are published together. But the researcher can only share the data from the funded part. The BBSRC does not ask researchers to share data that they haven’t funded.

Auditing questions

Who is in charge here?

The academics raised the concern that there could be ‘mission creep’ where the funders expect people to do things that are a waste of time. They mentioned that an ideal situation would be where the research community decide what they want to share and what they don’t wish to share.

Michael noted that the BBSRC has to be guided by the community on their own community norms for data sharing, and this is why aspects of the data sharing policy is quite open. He noted that this meeting represented the first part of the process – where the funder comes together with communities to decide what is essential.

In addition, many journals are now requiring open data. It is the funders, the researchers and the journals who are asking for it. To some extent the BBSRC policy is guided by what the journals are asking for.

The policing process

The group expressed interest in how the BBSRC policy is policed and what would be the focus of that policing. Michael stated that BBSRC are investigating options of how to monitor compliance, but that it does not currently appear feasible to to check all of the submissions. BBSRC will monitor compliance, but will probably start with dipstick testing. They will look at historical projects and see where the process goes from there. In practice, this is likely to initially involve examining the degree of adherence to the submitted data management plans. If a researcher has acted reasonably and justified their mechanisms of data sharing, then it is unlikely that there would be any actions beyond noting where  difficulties had occurred.

Note, however that if a researcher has submitted a grant application with a data sharing statement there is a reasonable expectation to share the data.

Ultimately the data release will be policed. In areas where data sharing is prevalent, communities police themselves because researchers ask and expect the data to be available. In some cases you can’t publish without an accession number.

Michael noted there are places researchers can put information about published data into ResearchFish. ResearchFish is currently the only mechanism to capture information regarding post-award activities.

Where do we put the data?

The question arose about how other universities are managing the policy. Michael responded that many have started institutional repositories. The institutional response depends on where the majority of their research sits.

A possible solution for ensuring the data is discoverable would be a catalogue of what is stored in an institutional repository, with metadata about the data. That metadata would itself need to be discoverable. If the data is being held in a centralised repository it is possible to pay the cost upfront before the end of the grant.

The group noted there was a publishing preference for discipline specific repositories over institutional repositories because the community knows how to look after the work. These repositories are hosted by ‘people who know what they are doing’. They are discoverable, where the community can decide on the metadata and the required standards.

Michael agreed that the ideal was open discoverability. The question is what will be practically possible.

A way of considering the question is asking how would another researcher find the information? If the data is available from a researcher by request this should be noted in the paper. If it is available in a repository then the paper should state that. If the journal has told readers where the data is, then it should be self-evident.

Issues with obsolescence

Michael noted that there is an ongoing issue of obsolete data formats and disks. Given there are ideals and reality, it becomes a question of how to store and handle the information.

When data exists in a proprietary format, the researcher needs to think about how to access it in the longer term. What if the organisation goes out of business? Or the technology upgrades so you can’t get hold of the data in an earlier format? If data exists in a physical format then it is possible to go back and read it. However, if not then it is quite important to think about issues relating to long-term access. Lots of data will be obsolete.

There are some solutions for this issue. The Open Microscopy Environment is a joint project between universities, research establishments, industry and the software development community. It develops open-source software and data format standards for the storage and manipulation of biological microscopy data. This is a community-generated solution as a recognised problem. It has a database that you can upload any file format.

Issues with data sharing in the biological sciences

The BBSRC allows a reasonable embargo until the researcher has exploited the data for publication. If the researcher is planning on releasing further publications then they should consider carefully when to release the data., Michael noted, this is ‘not a forever thing’. The BBSRC do say there are reasonable limits, and some journals will expect data to be released alongside publications.

Commercial partners

Data emerging from BBSRC funded research needs to be shared unless there is a reason why not – and commercial partners who need to protect their intellectual property can be a good reason to delay data sharing. However once the Intellectual Property is protected, it is protected. The BBSRC allows researchers to embargo the data.

Michael also noted there are things that can be done with data, for example releasing it under license. An example is, if a researcher is working with a commercial partner who is concerned about other commercial competitors, it would be possible to require people to sign non-disclosure agreements. There are ways to deal with commercial data, as you would with other intellectual products.

It was noted by the researchers in the meeting that this type of arrangement is likely to mean the company doesn’t want to go through the process and won’t collaborate.

Exceptions

If data was generated before the policy was in place then the researcher has not submitted a grant application that requires them to share their data. The BBSRC is not expecting people to go back into history. Those researchers who wish to share historical research are not discouraged but this is not covered by the policy. The policy came into force in April 2007, however realistically it started in 2008.

In addition there are reasonable grounds for not sharing clearly incorrect or poor quality data. Many disciplinary databases will contain an element of quality control.   But Michael noted that the policy shouldn’t be a way for people to filter out inconvenient data and would expect the community to be self policing.

Future policy direction

Michael noted that this type of policy is becoming more prevalent not less. Open science is one of the Horizon 2020 themes – see the 2013 Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. Journals are getting involved as well. In the future sharing data will be more common – and driven by disciplinary norms. Anything that has been funded by RCUK will be required to share. It makes sense to government – the US National Institutes of Health and National Science Foundation have data sharing statements.

Continuing the dialogue

Michael indicated that he wants to talk to people about what the questions are so the BBSRC can refine issues in the policy.

Researchers who have questions about the policy can send them through to the Research Data Service team info@data.cam.ac.uk. If we are unable to answer them, we can ask BBSRC directly for clarification. We will then add the information to the University Research Data Management FAQ webpage.

Published 19 October 2015
Written by Dr Danny Kingsley, verified by Michael Ball, BBSRC
Creative Commons License

Openness, integrity & supporting researchers

Universities need to open research to ensure academic integrity and adjust to support modern collaboration and scholarship tools, and begin rewarding people who have engaged in certain types of process rather than relying on traditional assessment schemes. This was the focus of Emeritus Professor Tom Cochrane’s* talk on ‘Open scholarship and links to academic integrity, reward & recognition’  given at Cambridge University on 7 October.

The slides from the presentation are available here: PRE_Cochrane_DisruptingDisincentives_V1_20151007

Benefits of an open access mandate

Tom began with a discussion about aspects of access to research and research data and why it should be as open as possible. Queensland University of Technology introduced an open access mandate 12 years or so ago. They have been able to observe a number of effects on bibliometric citation rates, such as the way authors show up in Scopus.

The other is the way consulting opportunities arise because someone’s research is exposed to reading audiences that do not have access to the toll-gated literature. Another benefit is the recruiting of HDR students.

Tom outlined six areas of advantage for institutions with a mandate – researcher identity and exposure, advantage to the institution. He noted that they can’t argue causation but can argue correlation, with the university’s. improvement in research performance. Many institutions have been able to get some advantage of having an institutional repository that reflects the output of the institution.

However in terms of public policy, the funders have moved the game on anyway. This started with private funders like Wellcome Trust, but also the public funding research councils. This is the government taxpayer argument, which is happening in the US.

Tom noted that when he began working on open access policy he had excluded books because there are challenges with open access when there is a return to the author, but there has been a problem long term with publishing in the humanities and the social sciences. He said there was an argument that there has been a vicious downward spiral that oppresses the discipline, by making the quality scholarship susceptible to judgements about sales appeal for titles in the market, assessments which may be unrelated. Now there is a new model called Knowledge Unlatched which is attempting to break this cycle and improve the number of quality long form outputs in Humanities and Social Sciences.

Nightmare scenarios

Tom started by discussing the correlation between academic integrity and research fraud by discussing the disincentives in the system. What are potential ‘nightmare’ scenarios?

For early career researcher nightmares include the PhD failing, being rejected for a job or promotion application, a grant application fails, industry or consultancy protocols fail or a paper doesn’t get accepted.

However a worse nightmare is a published or otherwise proclaimed finding is found to be at fault – either through a mistake or there is something more deliberate at play. This is a nightmare for the individual.

However it is very bad news for an institution to be on the front page news. This is very difficult to rectify.

Tom spoke about Jan Hendrik Schon’s deception. Schon was a physicist who qualified in Germany, went to work in Bell Labs in the US. He discovered ‘organic semiconductors’. The reviewers were unable to replicate the results because they didn’t have any access to the original data with lab books destroyed and samples damaged beyond recovery. The time taken to investigate and the eventual withdrawal of the research was 12.5 years, and the effort involved was extraordinary.

Incentives for institutions and researchers

Academics work towards recognition and renown, respect and acclaim. This is based on a system of dissemination and publication, which in turn is based on peer review and co-authorship using understood processes. Financial reward is mostly indirect.

Tom then discussed what structures universities might have in place. Most will have some kind of code of conduct to advise people about research misconduct. There are questions about how well understood or implemented this advice or knowledge about those kinds of perspectives actually are.

Universities also often provide teaching about authorship and the attribution of work – there are issues around the extent that student work gets acknowledged and published. Early career researchers are, or should be, advised about requirements in attributing work to others that have not contributed, as well as a good understanding of plagiarism and ethical conduct.

How does openness help?

Tom noted that we are familiar with the idea of open data and open access. But another aspect is ‘open process’. Lab work books for example, showing progress in thinking, approaches and experiments can be made open though there may be some variations in the timing of when this occurs.

The other pressing thing about this is that the nature of research itself is changing profoundly. This includes extraordinary dependence on data, and complexity requiring intermediate steps of data visualisation. In Australia this is called eResearch, in the UK it is called eScience. These eResearch techniques have been growing rapidly, and in a way that may not be understood or well led by senior administrators.

Using data

Tom described a couple of talks by early or mid career researchers at different universities. They said that when they started they were given access to the financial system, the IT and Library privileges. But they say ‘what we want to know are what are the data services that I can get from the University?’. This is particularly acute in the Life Sciences. Where is the support for the tools? What is the University doing by way of scaffolding the support services that will make that more effective for me? What sort of help and training will you provide in new ways of disseminating findings and new publishing approaches?

Researchers are notoriously preoccupied with their own time – they consider they should be supported better with these emerging examples. We need more systematic leadership in understanding these tools with a deliberate attention by institutional leadership to overcoming inertia.

The more sustained argument about things being made open relates to questions about integrity and trust – where arguments are disputes about evidence. What’s true for the academy in terms of more robust approaches to prevent or reduce inaccuracy or fraud, is also true in terms of broader public policy needs for evidence based policy.

Suggestions for improvement

We need concerted action by people at certain levels – Vice Chancellors, heads of funding councils, senior government bureaucrats. Some suggested actions for institutions and research systems at national and international levels include concerted action to:

  • develop and support open frameworks
  • harmonise supporting IP regimes
  • reframe researcher induction
  • improve data and tools support services
  • reward data science methods and re-use techniques
  • rationalise research quality markers
  • foster impact tracking in diverse tools

Discussion

Friction around University tools

One comment noted that disincentives at Cambridge University manifest as frictions around the ways they use the University tools – given they don’t want to waste time.

Tom responded that creating a policy is half the trick. Implementing it in a way that makes sense to someone is the other half. What does a mandate actually mean in a University given they are places where one does not often successfully tell someone else what to do?

However research and support tools are getting more efficient. It is a matter of marshalling the right expertise in the right place. One of the things that is happening is we are getting diverse uptakes of new ideas. This is reliant on the talent of the leadership that might be in place or the team that is in place. It could get held back by a couple of reactionary or unresponsive senior leaders. Conversely the right leadership can make striking progress.

Openness and competition

Another comment was how does openness square with researchers being worried about others finding about what they are doing in a competitive environment?

Tom noted that depending on the field, there may indeed need to be decision points or “gating” that governs when the information is available. The important point is that it is available for review for the reasons of integrity explored earlier. Exceptions will always apply as in the case of contract research being done for a company by an institution that is essentially “black box”. There would always have to be decisions about openness which would be part of working out the agreement in the first place.

Salami slicing publication

A question arose about the habit of salami slicing research into small publications for the benefits of the Research Excellence Framework and how this matches with openness.

Tom agreed that research assessment schemes need to be structured to encourage or discourage certain types of scholarly output in practice. The precursor to this practice was the branching of journal titles in the 1970s – the opportunity for advantage at the time was research groups and publishers. There has to be a leadership view from institutional management on what kind of practical limits there can be on that behaviour.

This sparked a question about the complexity of changing the reward system because researchers are judged by the impact factor, regardless of what we say to them about tweets etc. How could the reward system be changed?

Tom said the change would need to be that the view that reward is only based on research outputs is insufficient. Other research productivity needs reward. This has to be led. It can’t be a half-baked policy – put out by a committee. Needs to be trusted by the research community.

Open access drivers

A question was asked about the extent to which the compliance agenda that has been taken by the funders has led its course? Is this agenda going to be taken by the institutions.

Tom said that he has thought about this for a long time. He thought originally OA would be led by the disciplines because of the example of the High Energy Physics community which built a repository more than 20 years ago. Then there was considerable discussion, eg in the UK in early 2000s about aligning OA with institutional profile. But institutional take up was sporadic. In Australia in 2012 we only had six or seven universities with policies (which doesn’t necessarily mean there had been completely satisfactory take up in each of those).

Through that time the argument for a return on tax payer investment has become the prevalent government one. Tom doesn’t think they will move away from that, even though there has been a level of complexity relating to the position that might not have been anticipated, with large publishers keen to be embedded in process.

This moved to a question of whether this offers an opportunity for the institution beyond the mandate?

Tom replied that he always thought there was an advantage at an institutional and individual level that you would be better off if you made work open. The main commercial reaction has been for the large publishers to seek to convert the value that exists in the subscription market into the same level of value in input fees i.e, Article Processing Charges.

It should be understood finally that academic publishing and the quality certification for research does have a cost, with the question being what that level of cost should really be.

About the speaker

*Emeritus Professor Tom Cochrane was briefly visiting Cambridge from Queensland University of Technology in Australia. During his tenure as the Deputy Vice-Chancellor (Technology, Information and Learning Support), Professor Cochrane introduced the world’s first University-wide open access mandate, in January 2004. Amongst his many commitments Professor Cochrane serves on the Board of Knowledge Unlatched (UK) is a member of the Board of Enabling Open Scholarship (Europe) and was co-leader of the project to port Creative Commons into Australia.

Published 12 October 2015
Written by Dr Danny Kingsley
Creative Commons License