As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Dr Matthias Ammon looks at theses and their use.
It may sound obvious, but PhD theses are a huge reservoir of original research content, given that each thesis represents at least three or four years’ focussed engagement with a specialised research topic. Traditionally, however, the results of this work have not been easily accessible.
A print copy of the approved thesis would be deposited in the library of the university where the PhD was undertaken so that access was mainly restricted to other members of that university. Interested readers have to travel to visit the library or rely on frequently costly interlibrary loans. While some of the research contained in theses would be published in articles or monographs, this still means that an enormous amount of research was and is effectively locked away.
With the changes in technology in recent decades allied with the rise of Open Access and institutional repositories, the accessibility of PhD theses in general has improved. In Australia, the Australian Digital Theses program began in 1998, expanding to the Australasian Digital Theses program in 2005. This used VT-ETD software to host digital theses at individual institutions which were collated to one search engine. The ADT website, a central metadata repository, was hosted at the University of New South Wales. This was decommissioned in 2011 as theses were migrated to their various institutional repositories. All Australian theses are now findable in Trove, the National Library of Australia’s Trove service. There are 334, 000 theses listed in Trove of which over 119,000 are available online.
A significant number of UK universities now require the deposit of a digital copy of a thesis in the university’s repository as a condition for awarding the PhD degree. Usually this entails making the thesis openly available although embargoes may be placed for reasons of confidentiality or commercial concerns. In addition, PhD students funded by any of the UK research councils under the RCUK Training Grant are required to make their theses available Open Access.
Although it is not yet mandatory at the University of Cambridge for PhD students to provide a digital copy of their thesis, students can voluntarily upload their approved dissertations to the institutional repository, Apollo. Approximately one in 10 PhD students do so. In the next couple of weeks, the Office of Scholarly Communication is embarking on a pilot for the systematic submission of digital theses with selected departments.
There are national and international repositories that aggregate access to PhD theses, such as the British Library’s EThOS (for the UK) or DART-Europe (for European universities), making it easier for interested researchers to find relevant material without having to trawl through individual repositories.
Open Access Theses and Dissertations aims to be the best possible resource for finding open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 1100 colleges, universities, and research institutions. OATD currently indexes 3,422,634 theses and dissertations.
NDLTD, the Networked Digital Library of Theses and Dissertations provides information and a search engine for electronic theses and dissertations (ETDs), whether they are open access or not. The service also provides ‘Guidance Briefs’ on topics such as Copyright and Preserving and Curating ETD Research Data and Complex Digital Objects.
Proquest Theses and Dissertations (PQDT) is a database of dissertations and theses published digitally or in print.
Note these are made available for a fee that does not benefit the author. [In September 2017 ProQuest contacted us to say they do pay royalties. Their policy is here.] In addition access to PQDT may be limited depending on local library licensing arrangements.
Looking to the past
So while it is looking likely that most future PhD theses will be available online (either freely or requestable), what about the vast number of PhD theses written up to this point? For context, Cambridge alone holds over 40,000 printed theses, with approximately 1100 being added every year. Approximately 2,000 of these have been digitised at the request of individuals wishing to have access to the theses.
Last year we ran an ‘Unlocking Theses’ project to increase the number of Open Access theses in the repository, which stood at about 600 at the beginning of 2015. The Library also held over 1200 scanned theses on an internal server. The Unlocking Theses project added all of these scanned theses held by the Library into the University repository. The Development and Alumni Office were able to provide contact details for just over 600 of these authors. The majority of these authors have now been contacted and we have had a 35% positive response rate from them.
As of today we hold 2257 theses in the repository of which half are Open Access. The remaining theses are currently held in a Restricted Theses Collection but the biographical information about these theses is searchable. Approximately one third of requests we have from our Request a Copy service is for these theses. In addition some authors have found their restricted thesis online and requested we open access to it.
Cambridge is currently working with the British Library to digitise some of the 14,000 Cambridge theses they hold on microfilm. Our finances do not stretch to the whole corpus, so we have decided to digitise ten percent. This has meant a process to determine which theses we choose to have digitised. Considerations have included the quality of digitisation from microfilm for typeset versus typewritten theses (and indeed whether the thesis is printed single or double sided because of shadowing). We have also chosen theses on the basis of those disciplines are highly requested from our Digital Content Unit. This has proved to be challenging, not least because of the difficulty of determining disciplines of theses from our library catalogue.
We are hoping to upload these theses to the repository towards the end of the year, and with the addition of several hundred theses that have been digitised this year from the Digital Content Unit will double the number of theses we hold in the repository.
There are several issues that need to be considered before theses can be made available openly. The first concerns third party copyright, that is to say the inclusion of quotations, images, photographs or other material that does not represent original work on behalf of the thesis author but has been taken from previously published work. There is generally no problem with including such material in the copy of the thesis submitted for examination and the print version deposited in the University library, but making the thesis freely available online constitutes a change of use and requires separate permissions. This is a problem that applies to both current and older theses and requires checks on behalf of the author and possibly the library.
Another issue related to copyright is the author’s permission to make the thesis available which is necessary because the author retains the copyright for his work. For current theses, this permission can be incorporated into the submission process, either as part of the requirement for the PhD or by the author signing an agreement when the thesis is voluntarily uploaded.
However, it is not so easy to obtain permission for retrospective digitisation as we discovered during our Unlocking Theses project. The contact details of alumni are not always known and in cases where the original author is deceased it may be challenging to establish the copyright holder, making it difficult to obtain an explicit ‘opt-in’ permission. Finally, there are financial considerations as the digitisation of large number of theses requires a significant outlay for staff, equipment and administrative costs.
In recent years, a number of universities have undertaken large-scale digitisation projects of their holdings of PhD theses and have dealt with the permission issue in different ways.
- The University of Surrey interpreted the permission to share copies of theses for research purposes as applying to digital as well as print format and, with support from Proquest, digitised their entire thesis collection. They are prepared to take down theses upon request of the author but to date none have been received.
- The University of Edinburgh is currently undertaking a project to digitise all their theses where they are not contacting alumni at all due to the size of the project but will consider take-downs on request (they have received none for over 5000 theses made available so far).
- The University of Leicester’s digitisation project states that they ‘have contacted as many former students as possible about this but do not have contact details for everyone’, they otherwise follow a similar policy of take-down on request.
- The London School of Economics (LSE) also digitised their back catalogue of theses and contacted alumni with an opt-out option, i.e. if no response was received the thesis would be uploaded. LSE has also made statistics about downloads of their digitised theses available, showing that there is a real demand for access to this kind of research output. By comparison, Cambridge on average receives approximately two requests for non-digitised theses per day.
The experience of these UK universities also appears to indicate that alumni are for the most part happy to see their theses made openly available. If more institutions follow suit and dedicate funding to opening up the research undertaken by generations of students this large reservoir of research will no longer remain untapped.
There are other challenges related to digital theses that still remain to be solved, such as the problem of linking theses to their associated data and the question of persistent identifiers to seamlessly integrate the output of both individual researchers and institutions. In the future, consideration should be given to non-text or multimedia PhDs, as was debated at a recent panel discussion at the British Library.
For now though, opening up access to decades’ or even centuries’ worth of scholarship sitting on university library shelves in the form of physical copies of PhD theses sounds like a good start.