Tag Archives: repository

Request a copy: process and implementation

This blog post looks at a recent feature implemented in our repository called ‘Request a copy’ and discusses the process and management of the service. There is a related blog post which discusses the uptake and reaction to the facility.

As part of our recent upgrade to the University’s institutional repository (now renamed ‘Apollo‘), we implemented a new feature called ‘Request a copy’. ‘Request a copy’ operates on the principle of peer-to-peer sharing – if an item in Apollo is not yet available to the public, a repository user can ask the author for a copy of the item. Authors sharing copies of their work on an individual basis falls outside the publisher’s copyright restrictions; here, the repository is acting as a facilitator to a process which happens anyway – peer to peer sharing.

The main advantage of the ‘Request a copy’ feature is to open up the University’s most current research to a wider audience. Many of our users do not necessarily come from an academic background, or may be based within another discipline, or an institution where journal subscriptions are more limited. The repository is often their first port of call to find new research as it ranks highly in Google search results. We hope that these users will benefit from ‘Request a copy’ by being able to access new outputs early, at researchers’ discretion. Additionally, this may provide an added benefit to researchers by introducing new contacts and potential collaborations.

How it works

Screen Shot 2016-10-06 at 13.53.30Items in Apollo that are not yet accessible to the wider public are indicated by a padlock symbol that appears on the thumbnail image and filename link which users can usually click to download the file.

Reasons why the file may not yet be publicly available include:

  • Some publishers require that articles in repositories cannot be made available until they are published, or until a specified time after publication
  • We hold a number of digitised theses in the repository, and for some we have been unable to contact the author to secure permission to make their thesis available
  • Authors may choose to make their dataset available only once the related article is published

When a user clicks on a thumbnail or filename link containing a padlock, they are directed to the ‘Request a copy’ form. Here, they provide their name, email address and a message to the author. On clicking ‘Request copy’, an email is sent to the person who submitted the article, containing the user’s details. The recipient of this email then has the option to approve or deny the user’s request, to contact the user for more information, or (if they are not the author) to forward the request to the author.

How it really works

In practice, the process is slightly more complicated. For most of the content in the repository, the person who submitted an item will be a member of repository staff, rather than the item’s author. This means that for the most part, emails generated by the ‘Request a copy’ form were initially sent to members of the Office of Scholarly Communication team. In some cases, these requests were sent to people who have left the University, and we have had to query the system to retrieve these emails. As an interim measure, we have now directed all emails to support@repository.cam.ac.uk. These still need manual processing.

Theses

For theses where we have not received permission from the author to make them available, we forward requests to the University Library’s Digital Content Unit, who have traditionally provided digitised copies of theses at a charge of £65. We have  found however, that once information about this charge is communicated to the requester, very few (approximately 1%) actually complete the process of ordering a thesis copy.

We have been working with the Digital Content Unit on a trial where thesis copies were offered at £30, then £15. However, even at these cheaper prices, uptake remained low (it increased to 10%, but due to the small size of the sample, this only equated to two and three requests at each price point, and therefore may not be statistically significant). This indicates that the objection was to being charged at all, rather than to the particular amount. Work in this area remains ongoing to try and offer thesis copies as cheaply as possible to requesters, while allowing the Digital Content Unit to cover their costs.

Articles

If the request is for an article, we first need to check whether the article has actually been published and is already available Open Access. Although we endeavour to keep all our repository records up to date, unless we are informed that an article has been published, repository staff need to check each article for which publication is pending. This is a time-consuming manual process, and when we have a large backlog, sometimes it can take a while before an article is updated following publication.

If we found that the article has indeed been published and can be made Open Access, we amend the record, make the article available and email the requester to let them know they can now download the file directly from the repository.

On the other hand, if the article is still not published, or if it is under an embargo, we need to forward the request to the corresponding author(s). Sometimes their name(s) and email address(es) will be included within the article itself, and sometimes we have a record of who submitted the article via the Open Access upload form. However, if it is not clear from the article who the corresponding author is, or if their contact details are not included, and if the article was submitted by an administrator rather than one of the authors, we then need to search via the University’s Lookup service for the email addresses of any Cambridge authors, and search the internet for email addresses of any non-Cambridge authors, before we can forward on the request.

As a result, it can take repository staff up to 30 minutes to process an individual request. This is quicker if the article has been requested previously and the author’s contact details are already stored, but can take longer when we need to search. Sometimes, there is also repeat correspondence if the author has any queries, which adds to the total time in processing each request.

Amending our processes

Since introducing ‘Request a copy’, we have started collecting the email addresses of corresponding authors when an article is submitted, and we have commissioned a repository development company to ensure that ‘Request a copy’ emails can be sent directly to those authors for whom we have an email address – a feature that we are hoping to implement in the next few weeks.

However, if the author moves institution, their university email address will no longer be valid, and any requests for their work will again need to come via repository staff. One way to solve this would be to ask for an external (non-university) email address for the corresponding author at the point where they upload the article to the repository. However, this would introduce an extra step to an already onerous process and may act as a further barrier to authors submitting articles in the first place.

Generally, ‘Request a copy’ is a great idea and provides many benefits to the research community and beyond. But the implementation of this service has been challenging. The amount of time taken by each request has meant that some staff members have been redeployed from their usual jobs to facilitate these requests, which also has an impact on the backlog of articles in the repository that need to be checked in case they have since been published. If an article is published but still in the backlog (and therefore not publicly available in the repository), unnecessary requests for it could result in a reputational issue for the Office of Scholarly Communication and the University.

We will continue to look at our processes over the coming academic year, to see how we can improve our current workflows, and identify and resolve any issues, as well as determining where best to focus any further development work. In the related blog post on ‘Request a copy’, I’ll be talking about usage statistics for the service so far, some more unexpected use cases we have encountered, and feedback from our users that will help us to shape the service into the future.

Published 7 October 2016
Written by Sarah Middle
Creative Commons License

Request a copy: uptake and user experience

This post looks at the University of Cambridge repository  ‘Request a copy’ service from the user’s perspective in terms of uptake so far, feedback we have received, and reasons why people might request a copy of a document in our repository. You may be interested in the related blog post on our ‘Request a copy’ service, which discusses the concept behind ‘Request a copy’, the process by which files are requested, and how this has been implemented at Cambridge

Usage Statistics

The Request a Copy button has been much more successful than we anticipated, particularly because there is no actual ‘button’. By the end of September 2016 (four months after the introduction of ‘Request a copy’), we had received 1120 requests (approximately 280 requests per month), the vast majority of which were for articles (68%) and theses (28%). The remaining 4% of requests were for datasets or other types of resource. We are aware that this is a particularly quiet time in the UK academic year, and expect that the number of requests will increase now term has started again.

Of the requests for articles during this period, 38% were fulfilled by the author sending a copy via the repository, and 4% were rejected by clicking the ‘Don’t send a copy’ button. However, these figures could be misleading as a number of authors have also advised us that they have entered into correspondence with the requester to ask them for further information about who they are and why they are interested in this research. Eventually, this correspondence may result in the author emailing a copy of the paper to the requester, but as this happens outside the repository, it does not appear in our fulfilment statistics. Therefore, we suspect the figure for accepted requests is in actual fact slightly higher.

Of the articles requested during this period, 45% were yet to be published, and 55% were published but not yet available to those without a subscription to the journal. The large number of requests made prior to publication indicates the value of having a policy where articles are submitted to the repository on acceptance rather than publication – there is clearly interest in accessing this research among the wider public, and if they are able to make use of it rather than waiting during the sometimes lengthy period between acceptance and publication, this can make the research process more efficient.

Author Survey

To find out why authors might not be fulfilling requests through the repository links, Dr Lauren Cadwallader, one of our Open Access Research Advisors, sent a survey on 6 July 2016 to the 113 authors who had received requests but had not clicked on the repository link or been in touch with repository staff to advise of an alternative course of action. This survey had a 13% response rate, with 15 participants, as well as eight email responses from users who provided feedback but did not complete the survey.

The relatively low response rate is indicative of either a lack of engagement with or awareness of the process – it is possible that the request emails and survey email were dismissed as spam, or that researchers were unable to respond due to an already heavy workload. One way of addressing this could be to include some information about ‘Request a copy’ in our existing training sessions, in particular to emphasise how quick the process can be in cases where the author is happy to approve the request without needing any further information from the user. We have also been developing the wording of the email sent to the author, to explain the purpose of the service more clearly, and to make it sound like a legitimate message that is less likely to be dismissed as spam.

Of the 15 people who participated in the survey, the majority were aware that they had received an email, which shows that lack of response is not always due to emails being lost in spam filters. When asked for the reason why they did not fulfil the request via the repository link, 35% of authors replied that they had emailed the requester directly, either to send the file, to request more information, or to explain why it was not possible for them to share the file at this time. This finding is quite positive, as it indicates that over a third of these requests are indeed being followed up. Although it would be helpful to us to be able to keep track of approvals through the system, at least this means that the service is fulfilling its purpose in providing a way for authors to interact with other interested researchers, and to share their work if appropriate. In fact, one of the aspects that participants liked best about the ‘Request a copy’ service was the ability to communicate directly with the requestor.

Two authors did not respond to the request because the article was available elsewhere on the internet, such as their personal / departmental website, or a preprint server (where the restrictions relating to repositories do not apply), although they did not communicate this to the requestor. In these cases, it is definitely positive that the authors are happy to share their work; however, it does show that there is often an assumption among researchers that people interested in reading their articles will be restricted to those already in their specific disciplinary communities.

Requests from people who are unaware of sites where the research might also be made available demonstrates that there is indeed an appetite among those outside of academia, or from different subject areas. This is generally a really positive thing, as it facilitates the University’s research outputs to educate and inspire a new audience beyond the more traditional communities, and could potentially lead to new collaboration opportunities. To ensure that requestors are able to access the material, and that researchers are not bombarded with requests for documents that are already freely available, authors can provide links to any external websites that are hosting a preprint version of the article, and we will add them to the repository record.

Other responses indicated that we were not necessarily emailing the right person, as participants said that they had not approved the request because they were not the corresponding author, or because they thought a co-author had already responded. At the outset of the service, we felt that emailing as many authors as possible would increase the likelihood of receiving a response; however, the survey results show that it would be better to send requests to the corresponding author(s) only, at least in cases where it is clear who they are.

An issue we have encountered on a semi-regular basis since HEFCE’s Open Access policy came into force is that of making an article’s metadata available prior to its publication. Although HEFCE and funder policies state that an article’s repository record should be discoverable, even if the article itself must be placed under embargo based on publisher restrictions, there is concern among some authors that metadata release breaches the publisher’s press embargo. You can read about this issue in some detail here.

Receiving requests for an article via the ‘Request a copy’ service can be unsettling for authors as it demonstrates how easily the repository record can be accessed, and rather than respond to the request, they contact the Open Access team to ask for the metadata record to be withdrawn until the article is published. This demonstrates a need to communicate more clearly, both on our website and within the ‘Request a copy’ pages in the repository, what is required of authors as part of HEFCE and funder Open Access policies. We will also be more explicit in the ‘Request a copy’ emails sent to authors in stating that sharing their articles via this service will not be seen as a breach of the publisher’s embargo. In cases where the author does not wish to disseminate their article before it is published, they have the option to deny any requests they receive.

Facilitating requests

There have been several instances where press interest around an article at the point of publication has generated a large number of requests, each of which must be responded to individually by the author. This has resulted in several authors asking that we automatically approve every request rather than forwarding them on. Unfortunately this is not possible for us to do, due to the legal issues surrounding ‘Request a copy’.

It is perfectly acceptable for an author to send a copy of their article to an individual, but if a repository makes that article available to everyone who requests it before the embargo has been lifted, this would be a breach of copyright because it would be ‘systematic distribution’. While responding to multiple requests is likely to be seen as an annoyance by an already overstretched researcher, we hope that a large volume of requests will also be viewed in a positive light, as it demonstrates the interest people have in their work.

Use cases

An interesting example of a request we received was actually from one of the authors of the article, as they did not have access to a copy themselves. This raises some questions about communication between the researchers in this case, if the ‘Request a copy’ service was seen to be a better way of gaining access to the author’s own research, rather than contacting one of their co-authors.

A more surprising use case is that of a plaintiff who had lost a legal case. The plaintiff was requesting an as-yet unpublished article that had been written about the case, because the article appears to argue in favour of the plaintiff and could potentially inform a future appeal. This is a good example of how the ‘Request a copy’ service could be of direct benefit in the world outside academia.

Although the vast majority of requests have been for research outputs such as articles, theses and datasets, we also occasionally receive requests for images that belong to collections held in different parts of the University, where high-quality versions are stored in the repository under restricted access conditions. With these requests, it can be more difficult to find who the copyright-holder is, which sometimes requires detective work by the repository team. In one case, permission had to be sought from a photographer who only has a postal address, and therefore required more explanation about the repository more generally, as well as the specific request.

Looking to the future

We will use this research and any further feedback we receive to improve the experience of our ‘Request a copy’ service for both authors and requestors, including implementing the ideas suggested above. Usage statistics will continue to be monitored, and we may run a user survey again to determine how far the service has improved, as well as to identify any new issues.

In the meantime, if you have any comments or questions about our ‘Request a copy’ service, either as an author or a requester (or both), please send us an email to support@repository.cam.ac.uk .

Published 7 October 2016
Written by Sarah Middle
Creative Commons License

Show me the money – the path to a sustainable Research Data Facility

Like many institutions in the UK, Cambridge University has responded to research funders’ requirements for data management and  sharing with a concerted effort to support our research community in good data management and sharing practice through our Research Data Facility. We have written a few times on this blog and presented to describe our services. This blog is a description of the process we have undertaken to support these services in the long term.

Funders expect  that researchers make the data underpinning their research available and provide a link to this data in the paper itself. The EPSRC started checking compliance with their data sharing requirement on 1 May 2015. When we first created the Research Data Facility we spoke to many researchers across the institution and two things became very clear. One was that there was considerable confusion about what actually counts as data, and the second was that sharing data on publication is not something that can be easily done as an afterthought if the data was not properly managed in the first place.

We have approached these issues separately. To try and determine what is actually required from funders beyond the written policies we have invited representatives from our funders to come to discussions and forums with our researchers to work out the details. So far we have hosted Ben Ryan from the EPSRC, Michael Ball from the BBSRC and most recently David Carr and Jamie Enoch from the Wellcome Trust and CRUK respectively.

Dealing with the need for awareness of research data management has been more complex. To raise awareness of good practice in data management and sharing we embarked on an intense advocacy programme and in the past 15 months have organised 71 information sessions about data sharing (speaking with over 1,700 researchers). But we also needed to ensure the research community was managing its data from the beginning of the research process. To assist this we have developed workshops on various aspects of data management (hosting 32 workshops in the past year), a comprehensive website, a service to support researchers with their development of their research data management plans and a data management consultancy service.

So far, so good. We have had a huge response to our work, and while we encourage researchers to use the data repository that best suits their material, we do offer our institutional repository Apollo as an option. We are as of today, hosting 499 datasets in the repository. The message is clearly getting through.

Sustainability

The word sustainability (particularly in the scholarly communication world) is code for ‘money’. And money has become quite a sticking point in the area of data management. The way Cambridge started the Research Data Facility was by employing a single person, Dr Marta Teperek for one year, supported by the remnants of the RCUK Transition Fund. It became quickly obvious that we needed more staff to manage the workload and now the Facility employs half an Events and Outreach Coordinator and half a Repository Manager plus a Research Data Adviser who looks after the bulk of the uploading of data sets into the repository.

Clearly there was a need to work out the longer term support for staffing the Facility – a service for which there are no signs of demand slowing. Early last year we started scouting around for options.  In April 2013 the RCUK released some guidance that said it was permissible to recover costs from grants through direct charges or overheads – but noted institutions could not charge twice. This guidance also mentioned that it was permissible for institutions to recover costs of RDM Facilities as other Small Research Facilities, “provided that such facilities are transparently charged to all projects that use them”.

Transparency

On the basis of that advice we established a Research Data Facility as a Small Research Facility according to the Transparent Approach to Costing (TRAC) methodology. Our proposal was that Facility’s costs will be recovered from grants as directly allocated costs. We chose this option rather than overheads because of the advantage of transparency to the funder of our activities. By charging grants this way it meant a bigger advocacy and education role for the Facility. But the advantage is that it would make researchers aware that they need to consider research data management seriously, that this involves both time and money, and that it is an integral part of a grant proposal.

Dr Danny Kingsley has argued before (for example in a paper ‘Paying for publication: issues and challenges for research support services‘) that by centralising payments for article processing charges, the researchers remain ignorant of the true economics of the open access system in the way that they are generally unaware of the amounts spent on subscriptions. If we charged the costs of the Facility into overheads, it becomes yet another hidden cost and another service that ‘magically’ happens behind the scenes from the researcher’s point of view.

In terms of the actual numbers, direct costs of the Research Data Facility included salaries for 3.2 FTEs (a Research Data Facility Manager, Research Data Adviser, 0.5 Outreach and Engagement Coordinator, 0.5 Repository Manager, 0.2 Senior Management time), hardware and hardware maintenance costs, software licences, costs of organising events as well as the costs of staff training and conference attendance. The total direct annual cost of our Facility was less than £200,000. These are the people cost of the Facility and are not to be confused with the repository costs (for which we do charge directly).

Determining how much to charge

Throughout this process we have explored many options for trying to assess a way of graduating the costing in relation to what support might be required. Ideally, we would want to ensure that the Facility costs can be accurately measured based on what the applicant indicated in their data management plan. However, not all funders require data management plans. Additionally, while data management plans provide some indication of the quantity of data (storage) to be generated, they do not allow a direct estimate of the amount of data management assistance required during the lifetime of the grant. Because we could not assess the level of support required for a particular research project from a data management plan, we looked at an alternative charging strategy.

We investigated charging according to the number of people on a team, given that the training component of the Facility is measurable by attendees to workshops. However, after investigation we were unable to easily extract that type of information about grants and this also created a problem for charging for collaborative grants. We then looked at charging a small flat charge on every grant requiring the assistance of the Facility and at charging proportionally to the size (percentage of value) of the grant. Since we did not have any compelling evidence that bigger grants require more Facility assistance, we proposed a model of flat charging on all grants, which require Facility assistance. This model was also the most cost-effective from an administrative point of view.

As an indicator of the amount of work involved in the development of the Business Case, and the level of work and input that we have received relating to it, the document is now up to version 18 – each version representing a recalculation of the costings.

Collaborative process

A proposal such as we were suggesting – that we charge the costs of the Facility as a direct charge against grants – is reasonably radical. It was important that we ensure the charges would be seen as fair and reasonable by the research community and the funders. To that end we have spent the best part of a year in conversation with both communities.

Within the University we had useful feedback from the Open Access Project Board (OAPB) when we first discussed the option in July last year. We are also grateful to the members of our community who subsequently met with us in one on one meetings to discuss the merits of the Facility and the options for supporting it. At the November 2015 OAPB meeting, we presented a mature Business Case. We have also had to clear the Business Case through meetings of the Resource Management Committee (RMC).

Clearly we needed to ensure that our funders were prepared to support our proposal. Once we were in a position to share a Business Case with the funders we started a series of meetings and conversations with them.

The Wellcome Trust was immediate in its response – they would not allow direct charging to grants as they consider this to be an overhead cost, which they do not pay. We met with Cancer Research UK (CRUK) in January 2016 and there was a positive response about our transparent approach to costing and the comprehensiveness of services that the Facility provides to researchers at Cambridge. These issues are now being discussed with senior management at CRUK and discussions with CRUK are still ongoing at the time of writing this report (May 2016). [Update 24 May: CRUK agreed to consider research data management costs as direct costs on grant applications on a case by case basis, if justified appropriately in the context of the proposed research].

We encourage open dialogue with the RCUK funders about data management. In May 2015 we invited Ben Ryan to come to the University to talk about the EPSRC expectations on data management and how Cambridge meets these requirements. In August 2015 Michael Ball from the BBSRC came to talk to our community. We had an indication from the RCUK that our proposal was reasonable in principle. Once we were in a position to show our Business Case to the RCUK we invited Mark Thorley to discuss the issue and he has been in discussion with the individual councils for their input to give us a final answer.

Administrative issues

Timing in a decision like this is challenging because of the large number of systems within the institution that would be affected if a change were to occur. In anticipation of a positive response we started the process of ensuring our management and financial systems were prepared and able to manage the costing into grants – to ensure that if a green light were given we would be prepared.  To that end we have held many discussions with the Research Office on the practicalities of building the costing into our systems to make sure the charge is easy to add in our grant costing tool. We also had numerous discussions on how to embed these procedures in their workflows for validating whether the Facility services are needed and what to do if researchers forget to add them. The development has now been done.

A second consideration is the necessity to ensure all of the administrative staff involved in managing research grants (at Cambridge this is a  group of over 100 people) are aware of the change and how to manage both the change to the grant management system and also manage the questions from their research community. Simultaneously we were also involved in numerous discussions with our invaluable TRAC team at the Finance Division at the University who helped us validate all the Facility costs (to ensure that none of the costs are charged twice) and establishing costs centres and workflows for recovering money from grants.

Meanwhile we have had to keep our Facility staff on temporary contracts until we are in a position to advertise the roles. There is a huge opportunity cost in training people up in this area.

Conclusion

As it happened, the RCUK has come back to us to say that we can charge this cost to grants but as an overhead rather than direct cost. Having this decision means we can advertise the positions and secure our staffing situation. But we won’t be needing the administrative amendments to the system, nor the advocacy programme.

It has been a long process given we began preparing the Business Case in March 2015. The consultation throughout the University and the engagement of our community (both research and funder) has given us an opportunity to discuss the issues of research data management more widely. It is a shame – from our perspective – that we will not be able to be transparent about the costs of managing data effectively.

The funders and the University are all working towards a shared goal – we are wanting a culture change towards more open research, including the sharing of research data. To achieve this we need a more aware and engaged research community on these matters.  There is much advocacy to do.

Published 8 May 2016
Written by Dr Danny Kingsley and Dr Marta Teperek
Creative Commons License