Tag Archives: open access

How open is Cambridge?

As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this final OAWeek post Dr Arthur Smith analyses how much Cambridge research is openly available.

For us in the Office of Scholarly Communication it’s important that, as much possible, the University’s research is made Open Access. While we can guarantee that research deposited in the University repository Apollo will be made available in one way or another, it’s not clear how other sources of Open Access contribute to this goal. This blog is an attempt to quantify the amount of Cambridge research that is openly available.

In mid-August I used Cottage Labs’ Lantern service in anLantern_Oct2016_Graphic attempt to quantify just how open the University’s research really is. Lantern uses DOIs, PMIDs or PMCIDs to match publications in a variety of sources such as CORE and Europe PMC, to determine the Open Access status of a publication – it will even try to look at a publisher’s website to determine an article’s Open Access status. This process isn’t infallible, and it relies heavily on DOI matching, but it provides a good insight into the possible sources of Open Access material.

To determine the base list of publications against which the analysis could be run,  I queried Web of Science (WoS) and Scopus to obtain a list of publications attributed to Cambridge authors. In 2015, the University published 9069 articles, reviews and conference papers according to Web of Science. Scopus returned a slightly lower figure of 7983 publications. Combining these two publication lists, and filtering to only include records with a DOI, produced one master list of 9714 unique publications (that’s ~26 publications/day!).

In 2015 the Open Access team processed 2746 HEFCE eligible submissions, so naïvely speaking, the University achieved a 28.3% HEFCE compliance rate. That’s not bad, especially because the HEFCE policy had not yet come into force, but what about other Open Access sources? We know that other universities in the UK are also depositing papers in their repositories, and some researchers make their work ‘gold’ Open Access without going through the Open Access team, so the total amount of Open Access content must be higher.

In addition to the Lantern analysis, I also exported all available DOIs from Apollo and matched these to the DOIs obtained from WoS/Scopus. WoS also classifies some publications as being Open Access, and I included these figures too. If a publication was found in at least one potentially Open Access source I classified it as Open Access. Here are the results:

Lantern_Oct2016_Figure1
Figure 1. Of 9714 DOIs analysed by Lantern, 51.8% appear in at least one open access source.

It is pleasing that our naïve estimate of 28.3% HEFCE compliance closely matches the number of records found in Apollo (26.2%). The discrepancy is likely due to a number of factors, including publications received by the Open Access Team that were actually published in 2014 or 2016, but submitted in 2015, and Apollo records that don’t have a publisher DOI to match against. However, the most important point to note is the overall open access figure – in 2015 more than 50% of the University’s scholarly publications with a DOI were available in at least one “open access” source.

Let’s dig a little deeper into the analysis. Using everyone’s favourite metric, the journal impact factor (JIF), the average JIF of articles in Apollo was 5.74 compared to 4.33 for articles that were not OA. Other repositories and Europe PMC achieved even higher average JIFs. On average, Open Access publications by Cambridge authors have a higher JIF (6.04) than articles that are not OA, which suggests that researchers are making value judgements on what to make Open Access based on journal reputation. If a paper appears in a low(er) impact journal, it’s less likely to be made Open Access. Anecdotally this is something we have experienced at Cambridge.

Lantern_Oct2016_Figure2
Figure 2. Average 2015 JIF of papers classified according to their open access status.

The WoS and Scopus exports contain citation information at the article level, so we can also look at direct citations received by these publications (up to 16 August 2016)  rather than relying on the JIF. I found that Open Access articles, on average, received 1.5 to 2 more citations than articles that are not Open Access. However, is this because authors are making their higher impact articles Open Access (which one might expect to receive more citations anyway) and are not bothering with the rest? Or this is effect due entirely to the greater accessibility offered by Open Access publication? Could the differences arise because of different researcher behaviour across different disciplines?

My feeling is that we have reached a turning point – the increased citation rates of Open Access material is not caused by the article being Open Access as these articles would have naturally received more citations anyway. Instead of looking at formal literature citations, the benefits of Open Access need to be measured outside of academia in areas that would not contribute to an articles citations.

Lantern_Oct2016_Figure5
Figure 3. Average citations received by papers according to their open access source.

Breaking it down by the source of Open Access reveals that articles that appear in other repositories receive significantly more citations than any other source. This potentially reveals that collaborative papers between researchers at different institutions are likely to have greater impact than papers conducted solely at one institution (Cambridge), however, a more thorough analysis that looks at author affiliations would be needed to confirm this.

If we focus on the WoS citation distribution the difference in average citations becomes clearer. Of 8348 WoS articles, not only are there fewer Open Access articles with no citations (14% vs 17%), but Open Access articles also receive more citations in general.

Lantern_Oct2016_Figure4
Figure 4. Citation distribution of papers found in WoS depending on their open access status.

What can we take away from this analysis? Firstly, Lantern is a valuable tool for discovering other sources of Open Access content. It identified over a thousand articles by Cambridge researchers in other institutional repositories that we did not know existed. When it comes time for the next REF, these other repositories may prove a vital lifeline in determining whether a paper is HEFCE compliant.

Secondly, more than 50% of the University’s 2015 research publications are potentially Open Access. Hopefully a similar analysis of 2016’s papers will show that even more of the University’s research is Open Access this year. And finally, although Open Access articles receive more citations than articles that are not Open Access, it is no longer clear whether this is caused by the article being Open Access, disciplinary differences, or if authors are more likely to make their best work Open Access.

Published 28 October 2016
Written by Dr Arthur Smith

Creative Commons License

Are academic librarians getting the training they need?

As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Claire Sewell looks at the training of library staff in areas relating to scholarly communication.

The problem

Few people would deny that the world of the academic library is changing. Users are becoming more and more sophisticated in their information gathering techniques and the role of the academic librarian needs to adapt accordingly or risk being left behind. Librarians are changing from the traditional gatekeeper role to one which helps their research community to disseminate the outputs of their work.

This shift offers academic library staff new opportunities to move into research support roles. An increasing number of libraries are establishing scholarly communication departments and advertising for associated roles such as Repository Managers and Data Specialists.  It’s also becoming common to see more traditional academic library roles advertised asking for at least a working knowledge of areas such as Open Access and Research Data Management.

This is an issue that we have been considering in the Office of Scholarly Communication for a while. My role as Research Skills Coordinator involves up-skilling Cambridge library staff in these areas so I’m more aware than most that it is a full time job. But what happens to those who don’t have this type of opportunity through their work? How do they find out about these areas which will be so relevant to their future careers?

For many new professionals studying is their main chance to get a solid grounding in the information world but with the profession undergoing such rapid change is the education received via these degrees suitable for working in 21st century academic libraries? This is a question that has been raised many times in the profession in recent years so it’s time to dig a bit deeper.

Hypothesis

Our hypothesis is simple: there is a systematic lack of education on scholarly communication issues available to those entering the library profession. This is creating a time bomb skills gap in the academic library profession and unless action is taken we may well end up with a workforce not suited to work in the 21st century research library.

In order to test this hypothesis we have designed a survey aimed at those currently working in scholarly communication and associated areas. We hope that asking questions about the educational background of these workers we can work to determine the suitability of the library and information science qualification for these types of role into the future and how problems might be best addressed.

After a process of testing and reworking, our survey was launched to the scholarly communication community on October 11th 2016. In less than 24 hours there were over 300 responses, clearly indicating that the subject had touched a nerve for people working in the sector. (And thank you to those who have taken the time to respond).

Preliminary findings

We were pleased to see that even without prompting from the survey, respondents were picking up on many of the issues we wanted to address. For example, the original focus of the survey was the library and information science qualification and its impact on those working in scholarly communication.

When we piloted the survey with members of our own team we realised how diverse their backgrounds were and so widened the survey to target those who didn’t hold an LIS qualification but worked in this area. This has already given us valuable information about the impact that different educational backgrounds have on scholarly communication departments and has gained positive feedback from survey respondents.

Many of the respondents talk of developing the skills they use daily ‘on the job’. Whilst library and information professionals are heavily involved in lifelong learning and it’s natural for skills to develop as new areas emerge, the formal education new professionals receive also needs to keep pace. If even recent graduates have to develop the majority of skills needed for these roles whilst they work this paints a worrying picture of the education they are undertaking.

The survey responses have also raised the issue of which skills employers are really looking for in library course graduates and how these are provided. Respondents highlighted a range of skills that they needed in their roles – far more than were included in the original survey questions. This opens up discussions about the vastly differing nature of jobs within scholarly communication and how best to develop the skill set needed.

A final issue highlighted in the responses received so far is that a significant number of people working in scholarly communication roles come from outside the library sector. Of course this has benefits as they bring with them very valuable skills but importing knowledge in this way may also be contributing to a widening skills gap for information professionals that needs to be addressed.

Next steps

The first task at the end of the collection period (you have until 5pm BST Monday 31 October) will be to analyse the results and share them with the wider scholarly communication community. There are plans for a blog post, journal article and conference presentations. We will also be sharing the anonymised data via the Cambridge repository.

Following that our next steps depend largely on the responses we receive from the survey. We have begun the process of reaching out to other groups who may be interested in similar issues around professional education to see if we can work together to address some of the problems. None of this will happen overnight but we hope that by taking these initial steps we can work to create academic libraries geared towards serving the researchers of the 21st century.

One thing that the survey has done already is raise a lot of interesting questions which could form the basis of further research. It shows that there is scope to keep exploring this topic and help to make sure that library and information science graduates are well equipped to work in the 21st century academic library.

Published 27 October 2016
Written by Claire Sewell
Creative Commons License

Theses – releasing an untapped resource

As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Dr Matthias Ammon looks at theses and their use.

It may sound obvious, but PhD theses are a huge reservoir of original research content, given that each thesis represents at least three or four years’ focussed engagement with a specialised research topic. Traditionally, however, the results of this work have not been easily accessible.

A print copy of the approved thesis would be deposited in the library of the university where the PhD was undertaken so that access was mainly restricted to other members of that university. Interested readers have to travel to visit the library or rely on frequently costly interlibrary loans. While some of the research contained in theses would be published in articles or monographs, this still means that an enormous amount of research was and is effectively locked away.

Increasing access

With the changes in technology in recent decades allied with the rise of Open Access and institutional repositories, the accessibility of PhD theses in general has improved. In Australia, the Australian Digital Theses program began in 1998, expanding to the Australasian Digital Theses program in 2005. This used VT-ETD software to host digital theses at individual institutions which were collated to one search engine. The ADT website, a central metadata repository, was hosted at the University of New South Wales. This was decommissioned in 2011 as theses were migrated to their various institutional repositories. All Australian theses are now findable in Trove, the National Library of Australia’s Trove service. There are 334, 000 theses listed in Trove of which over 119,000 are available online.

A significant number of UK universities now require the deposit of a digital copy of a thesis in the university’s repository as a condition for awarding the PhD degree. Usually this entails making the thesis openly available although embargoes may be placed for reasons of confidentiality or commercial concerns. In addition, PhD students funded by any of the UK research councils under the RCUK Training Grant are required to make their theses available Open Access.

Although it is not yet mandatory at the University of Cambridge for PhD students to provide a digital copy of their thesis, students can voluntarily upload their approved dissertations to the institutional repository, Apollo. Approximately one in 10 PhD students do so. In the next couple of weeks, the Office of Scholarly Communication is embarking on a pilot for the systematic submission of digital theses with selected departments.

Finding theses

There are national and international repositories that aggregate access to PhD theses, such as the British Library’s EThOS (for the UK) or DART-Europe (for European universities), making it easier for interested researchers to find relevant material without having to trawl through individual repositories.

Open Access Theses and Dissertations aims to be the best possible resource for finding open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 1100 colleges, universities, and research institutions. OATD currently indexes 3,422,634 theses and dissertations.

NDLTD, the Networked Digital Library of Theses and Dissertations provides information and a search engine for electronic theses and dissertations (ETDs), whether they are open access or not. The service also provides ‘Guidance Briefs’ on topics such as Copyright and Preserving and Curating ETD Research Data and Complex Digital Objects.

Proquest Theses and Dissertations (PQDT) is a database of dissertations and theses published digitally or in print. Note these are made available for a fee that does not benefit the author. [In September 2017 ProQuest contacted us to say they do pay royalties. Their policy is here.] In addition access to PQDT may be limited depending on local library licensing arrangements.

Looking to the past

So while it is looking likely that most future PhD theses will be available online (either freely or requestable), what about the vast number of PhD theses written up to this point? For context, Cambridge alone holds over 40,000 printed theses, with approximately 1100 being added every year. Approximately 2,000 of these have been digitised at the request of individuals wishing to have access to the theses.

Last year we ran an ‘Unlocking Theses’ project to increase the number of Open Access theses in the repository, which stood at about 600 at the beginning of 2015. The Library also held over 1200 scanned theses on an internal server. The Unlocking Theses project added all of these scanned theses held by the Library into the University repository. The Development and Alumni Office were able to provide contact details for just over 600 of these authors. The majority of these authors have now been contacted and we have had a 35% positive response rate from them.

As of today we hold 2257 theses in the repository of which half are Open Access. The remaining theses are currently held in a Restricted Theses Collection but the biographical information about these theses is searchable. Approximately one third of requests we have from our Request a Copy service is for these theses. In addition some authors have found their restricted thesis online and requested we open access to it.

Cambridge is currently working with the British Library to digitise some of the 14,000 Cambridge theses they hold on microfilm. Our finances do not stretch to the whole corpus, so we have decided to digitise ten percent. This has meant a process to determine which theses we choose to have digitised. Considerations have included the quality of digitisation from microfilm for typeset versus typewritten theses (and indeed whether the thesis is printed single or double sided because of shadowing). We have also chosen theses on the basis of those disciplines are highly requested from our Digital Content Unit. This has proved to be challenging, not least because of the difficulty of determining disciplines of theses from our library catalogue.

We are hoping to upload these theses to the repository towards the end of the year, and with the addition of several hundred theses that have been digitised this year from the Digital Content Unit will double the number of theses we hold in the repository.

Considerations

There are several issues that need to be considered before theses can be made available openly. The first concerns third party copyright, that is to say the inclusion of quotations, images, photographs or other material that does not represent original work on behalf of the thesis author but has been taken from previously published work. There is generally no problem with including such material in the copy of the thesis submitted for examination and the print version deposited in the University library, but making the thesis freely available online constitutes a change of use and requires separate permissions. This is a problem that applies to both current and older theses and requires checks on behalf of the author and possibly the library.

Another issue related to copyright is the author’s permission to make the thesis available which is necessary because the author retains the copyright for his work. For current theses, this permission can be incorporated into the submission process, either as part of the requirement for the PhD or by the author signing an agreement when the thesis is voluntarily uploaded.

However, it is not so easy to obtain permission for retrospective digitisation as we discovered during our Unlocking Theses project. The contact details of alumni are not always known and in cases where the original author is deceased it may be challenging to establish the copyright holder, making it difficult to obtain an explicit ‘opt-in’ permission. Finally, there are financial considerations as the digitisation of large number of theses requires a significant outlay for staff, equipment and administrative costs.

Big projects

In recent years, a number of universities have undertaken large-scale digitisation projects of their holdings of PhD theses and have dealt with the permission issue in different ways.

The experience of these UK universities also appears to indicate that alumni are for the most part happy to see their theses made openly available. If more institutions follow suit and dedicate funding to opening up the research undertaken by generations of students this large reservoir of research will no longer remain untapped.

There are other challenges related to digital theses that still remain to be solved, such as the problem of linking theses to their associated data and the question of persistent identifiers to seamlessly integrate the output of both individual researchers and institutions. In the future, consideration should be given to non-text or multimedia PhDs, as was debated at a recent panel discussion at the British Library.

For now though, opening up access to decades’ or even centuries’ worth of scholarship sitting on university library shelves in the form of physical copies of PhD theses sounds like a good start.

Published 26 October 2016
Written by Dr Matthias Ammon and Dr Danny Kingsley
Creative Commons License