Monthly Archives: October 2016

Theses – releasing an untapped resource

As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Dr Matthias Ammon looks at theses and their use.

It may sound obvious, but PhD theses are a huge reservoir of original research content, given that each thesis represents at least three or four years’ focussed engagement with a specialised research topic. Traditionally, however, the results of this work have not been easily accessible.

A print copy of the approved thesis would be deposited in the library of the university where the PhD was undertaken so that access was mainly restricted to other members of that university. Interested readers have to travel to visit the library or rely on frequently costly interlibrary loans. While some of the research contained in theses would be published in articles or monographs, this still means that an enormous amount of research was and is effectively locked away.

Increasing access

With the changes in technology in recent decades allied with the rise of Open Access and institutional repositories, the accessibility of PhD theses in general has improved. In Australia, the Australian Digital Theses program began in 1998, expanding to the Australasian Digital Theses program in 2005. This used VT-ETD software to host digital theses at individual institutions which were collated to one search engine. The ADT website, a central metadata repository, was hosted at the University of New South Wales. This was decommissioned in 2011 as theses were migrated to their various institutional repositories. All Australian theses are now findable in Trove, the National Library of Australia’s Trove service. There are 334, 000 theses listed in Trove of which over 119,000 are available online.

A significant number of UK universities now require the deposit of a digital copy of a thesis in the university’s repository as a condition for awarding the PhD degree. Usually this entails making the thesis openly available although embargoes may be placed for reasons of confidentiality or commercial concerns. In addition, PhD students funded by any of the UK research councils under the RCUK Training Grant are required to make their theses available Open Access.

Although it is not yet mandatory at the University of Cambridge for PhD students to provide a digital copy of their thesis, students can voluntarily upload their approved dissertations to the institutional repository, Apollo. Approximately one in 10 PhD students do so. In the next couple of weeks, the Office of Scholarly Communication is embarking on a pilot for the systematic submission of digital theses with selected departments.

Finding theses

There are national and international repositories that aggregate access to PhD theses, such as the British Library’s EThOS (for the UK) or DART-Europe (for European universities), making it easier for interested researchers to find relevant material without having to trawl through individual repositories.

Open Access Theses and Dissertations aims to be the best possible resource for finding open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 1100 colleges, universities, and research institutions. OATD currently indexes 3,422,634 theses and dissertations.

NDLTD, the Networked Digital Library of Theses and Dissertations provides information and a search engine for electronic theses and dissertations (ETDs), whether they are open access or not. The service also provides ‘Guidance Briefs’ on topics such as Copyright and Preserving and Curating ETD Research Data and Complex Digital Objects.

Proquest Theses and Dissertations (PQDT) is a database of dissertations and theses published digitally or in print. Note these are made available for a fee that does not benefit the author. [In September 2017 ProQuest contacted us to say they do pay royalties. Their policy is here.] In addition access to PQDT may be limited depending on local library licensing arrangements.

Looking to the past

So while it is looking likely that most future PhD theses will be available online (either freely or requestable), what about the vast number of PhD theses written up to this point? For context, Cambridge alone holds over 40,000 printed theses, with approximately 1100 being added every year. Approximately 2,000 of these have been digitised at the request of individuals wishing to have access to the theses.

Last year we ran an ‘Unlocking Theses’ project to increase the number of Open Access theses in the repository, which stood at about 600 at the beginning of 2015. The Library also held over 1200 scanned theses on an internal server. The Unlocking Theses project added all of these scanned theses held by the Library into the University repository. The Development and Alumni Office were able to provide contact details for just over 600 of these authors. The majority of these authors have now been contacted and we have had a 35% positive response rate from them.

As of today we hold 2257 theses in the repository of which half are Open Access. The remaining theses are currently held in a Restricted Theses Collection but the biographical information about these theses is searchable. Approximately one third of requests we have from our Request a Copy service is for these theses. In addition some authors have found their restricted thesis online and requested we open access to it.

Cambridge is currently working with the British Library to digitise some of the 14,000 Cambridge theses they hold on microfilm. Our finances do not stretch to the whole corpus, so we have decided to digitise ten percent. This has meant a process to determine which theses we choose to have digitised. Considerations have included the quality of digitisation from microfilm for typeset versus typewritten theses (and indeed whether the thesis is printed single or double sided because of shadowing). We have also chosen theses on the basis of those disciplines are highly requested from our Digital Content Unit. This has proved to be challenging, not least because of the difficulty of determining disciplines of theses from our library catalogue.

We are hoping to upload these theses to the repository towards the end of the year, and with the addition of several hundred theses that have been digitised this year from the Digital Content Unit will double the number of theses we hold in the repository.

Considerations

There are several issues that need to be considered before theses can be made available openly. The first concerns third party copyright, that is to say the inclusion of quotations, images, photographs or other material that does not represent original work on behalf of the thesis author but has been taken from previously published work. There is generally no problem with including such material in the copy of the thesis submitted for examination and the print version deposited in the University library, but making the thesis freely available online constitutes a change of use and requires separate permissions. This is a problem that applies to both current and older theses and requires checks on behalf of the author and possibly the library.

Another issue related to copyright is the author’s permission to make the thesis available which is necessary because the author retains the copyright for his work. For current theses, this permission can be incorporated into the submission process, either as part of the requirement for the PhD or by the author signing an agreement when the thesis is voluntarily uploaded.

However, it is not so easy to obtain permission for retrospective digitisation as we discovered during our Unlocking Theses project. The contact details of alumni are not always known and in cases where the original author is deceased it may be challenging to establish the copyright holder, making it difficult to obtain an explicit ‘opt-in’ permission. Finally, there are financial considerations as the digitisation of large number of theses requires a significant outlay for staff, equipment and administrative costs.

Big projects

In recent years, a number of universities have undertaken large-scale digitisation projects of their holdings of PhD theses and have dealt with the permission issue in different ways.

The experience of these UK universities also appears to indicate that alumni are for the most part happy to see their theses made openly available. If more institutions follow suit and dedicate funding to opening up the research undertaken by generations of students this large reservoir of research will no longer remain untapped.

There are other challenges related to digital theses that still remain to be solved, such as the problem of linking theses to their associated data and the question of persistent identifiers to seamlessly integrate the output of both individual researchers and institutions. In the future, consideration should be given to non-text or multimedia PhDs, as was debated at a recent panel discussion at the British Library.

For now though, opening up access to decades’ or even centuries’ worth of scholarship sitting on university library shelves in the form of physical copies of PhD theses sounds like a good start.

Published 26 October 2016
Written by Dr Matthias Ammon and Dr Danny Kingsley
Creative Commons License

Walking the talk- reflections on working ‘openly’

As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Dr Lauren Cadwallader discusses her experience of researching openly.

Earlier this year I was awarded the first Altmetric.com Annual Research grant to carry out a proof-of-concept study looking at using altmetrics as a way of identifying journal articles that eventually get included into a policy document. As part of the grant condition I am required to share this work openly. “No problem!” I thought, “My job is all about being open. I know exactly what to do.”

However, it’s been several years since I last carried out an academic research project and my previous work was carried out with no idea of the concept of open research (although I’m now sharing lots of it here!). Throughout my project I kept a diary documenting my reflections on being open (and researching in general) – mainly the mistakes I made along the way and the lessons I learnt. This blog summarises those lessons.

To begin at the beginning

I carried out a PhD at Cambridge not really aware of scholarly best practice. The Office of Scholarly Communication didn’t exist. There wasn’t anyone to tell me that I should share my data. My funder didn’t have any open research related policies. So I didn’t share because I didn’t know I could, or should, or why I would want to.

I recently attended The Data Dialogue conference and was inspired to hear many of the talks about open data but also realised that although I know some of the pitfalls researchers fall into I don’t quite feel equipped to carry out a project and have perfectly open and transparent methods and data at the end. Of course, if I’d been smart enough to attend an RDM workshop before starting my project I wouldn’t feel like this!

My PhD supervisor and the fieldwork I carried out had instilled in me some practices that are useful to carrying out open research:.

Lesson #1. Never touch your raw data files

This is something I learnt from my PhD and found easy to apply here. Altmetric.com sent me the data I requested for my project and I immediately saved it as the raw file and saved another version as my working file. That made it easy when I came to share my files in the repository as I could include the raw and edited data. Big tick for being open.

Getting dirty with the data

Lesson #2. Record everything you do

Another thing I was told to do during my PhD lab work was to record everything you do. And that is all well and good in the lab or the field but what about when you are playing with your data? I found I started cleaning up the spreadsheet Altmetric.com sent and I went from having 36 columns to just 12 but I hadn’t documented my reasons for excluding large swathes of data. So I took a step back and filled out my project notebook explaining my rationale. Documenting every decision at the time felt a little bit like overkill but if I need to articulate my decisions for excluding data from my analysis in the future (e.g. during peer review) then it would be helpful to know what I based my reasoning on.

Lesson #3. Date things. Actually, date everything

I’d been typing up my notes about why some data is excluded and others not so it informs my final data selection and I’d noticed that I’d been making decisions and notes as I go along but not recording when. If I’m trying to unpick my logic at a later date it is helpful if I know when I made a decision. Which decision came first? Did I have all my ‘bright ideas’ on the same day and now the reason they don’t look so bright is was because I was sleep deprived (or hungover in the case of my student days) and not thinking straight. Recording dates is actually another trick I learnt as a student – data errors can be picked up as lab or fieldwork errors if you can work back and see what you did when – but have forgotten to apply thus far. In fact, it was only at this point that I began dating my diary entries…

Lesson #4. A tidy desk(top) is a tidy mind

Screen Shot 2016-10-24 at 13.21.11I was working on this project just one day a week over the summer so every week I was having to refresh my mind as to where I stopped the week before and what my plans were that week. I was, of course, now making copious notes about my plans and dating decisions so this was relatively easy. However, upon returning from a week’s holiday, I opened my data files folder and was greeted by 10 different spreadsheets and a few other files. It took me a few moments to work out which files I needed to work on, which made me realise I needed to do some housekeeping.

Aside from making life easier now, it will make the final write up and sharing easier if I can find things and find the correct version. So I went from messy computer to tidy computer and could get back to concentrating on my analysis rather than worrying if I was looking at the right spreadsheet.

 

Lesson #5. Version control

One morning I had been working on my data adding in information from other sources and everything was going swimmingly when I realised that I hadn’t included all of my columns in my filters and now my data was all messed up. To avoid weeping in my office I went for a cup of tea and a biscuit.

Upon returning to my desk I crossed my fingers and managed to recover an earlier version of my spreadsheet using a handy tip I’d found online. Phew! I then repeated my morning’s work. Sigh. But at least my data was once again correct. Instead of relying on handy tips discovered by frantic Googling, just use version control. Archive your files periodically and start working on a new version. Tea and biscuits cannot solve everything.

Getting it into the Open

After a couple more weeks of problem free analysis it was time to present my work as a poster at the 3:AM Altmetrics conference. I’ve made posters before so that was easy. It then dawned on me at about 3pm the day I needed to finish the poster that perhaps I should share a link to my data. Cue a brief episode of swearing before realising I sit 15ft away from our Research Data Advisor and she would help me out! After filling out the data upload form for our institutional repository to get a placeholder record and therefore DOI for my data, I set to work making my spreadsheet presentable.

Lesson #6. Making your data presentable can be hard work if you are not prepared

I only have a small data set but it took me a lot longer than I thought it would to make it sharable. Part of me was tempted just to share the very basic data I was using (the raw file from Altmetric.com plus some extra information I had added) but that is not being open to reproducibility. People need to be able to see my workings so I persevered.

I’d labelled the individual sheets and the columns within those sheets in a way that was intelligible to me but not necessarily to other people so they all needed renaming. Then I had to tidy up all the little notes I’d made in cells and put those into a Read Me file to explain some things. And then I had to actually write the Read Me file and work out the best format for it (a neutral text file or pdf is best).

I thought I was finished but as our Research Data Advisor pointed out, my spreadsheets were returning a lot of errors because of the formula I was using (it was taking issue with me asking it to divide something by 0) and that I should share one file that included the formulae and one with just the numbers.

If I’d had time, I would have gone for a cup of tea and a biscuit to avoid weeping in the office but I didn’t have time for tea or weeping. Actually producing a spreadsheet without formulae turned out to be simple once I’d Googled how to do it and then my data files were complete. All I then needed to do was send them to the Data team and upload a pdf of my poster to the repository. Job done! Time to head to the airport for the conference!

Lesson #7. Making your work open is very satisfying.

Just over three weeks have passed since the conference and I’m amazed that already my poster has been viewed on the repository 84 times and my data has been viewed 153 times! Wowzers! That truly is very satisfying and makes me feel that all the effort and emergency cups of tea were worth it. As this was a proof-of-concept study I would be very happy for someone to use my work, although I am planning to keep working on it. Seeing the usage stats of my work and knowing that I have made it open to the best of my ability is really encouraging for the future of this type of research. And of course, when I write these results up with publication in mind it will be as an open access publication.

But first, it’s time for a nice relaxed cup of tea.

Published 25 October 2016
Written by Dr Lauren Cadwallader
Creative Commons License

An open letter to Blood

The Office of Scholarly Communication routinely advises Cambridge authors about their publishing options, and in the vast majority of cases we can help authors comply with funder mandates. However, there are a few notable journals that offer no compliant open access options for Research Council UK (RCUK) and Charity Open Access Fund (COAF) authors. One of those journals is Blood. We’ve previously called them out on their misleading advice:

Today we are urging Blood to offer their authors either self-archiving rights without cost and a maximum 6 month embargo or immediate open access under a Creative Commons Attribution (CC BY) licence. If Blood does not offer these options we will advise our researchers that they should publish elsewhere so as to remain compliant with their funders’ open access policies.

You can click through and read the open letter in full below:

If you would like to add your name to the list of signatories, please email info@osc.cam.ac.uk