Tag Archives: repository

Where are we now? Cambridge theses deposits one year in

As the nights draw in and the academic year 2018/19 begins, we are preparing to enter our second year of compulsory e-theses deposits. Our university repository, Apollo, is close to holding 6000 digital PhD theses and it is the intention of the University that this valuable research asset continues to grow into the future. The Apollo repository will play a large part in making this happen. Until recently only hardbound copies of theses were collected and catalogued by the University Library. Users could read theses on-site in Cambridge or order a digitisation of the thesis, but the introduction of e-thesis deposit to Apollo has meant that University of Cambridge theses are more accessible than ever before. It’s been an incredibly busy year and we have made some great steps forward in our management of theses in Cambridge.

e-theses at Cambridge – the background

The e-theses deposit story at Cambridge started in October 2016, when the Office of Scholarly Communication upgraded Apollo to allow the deposit of theses and began a digital thesis pilot for the academic year 2016/17. 11 departments in the University participated in the pilot, asking their PhD students to deposit an e-thesis alongside a hardcopy thesis. Theses deposited in Apollo during the pilot could either be made open access on request of the author or were treated as historical theses had been up until that point, whereby hardbound copies were held in the University Library and requestors could sign a declaration stating they wish to consult a thesis for private study or non-commercial research. Following the success of the pilot, the Board of Graduate Studies, at its meeting on 4 July 2017, made the decision that from 1 October 2017 all PhD students would be required to deposit both a hard copy and an electronic copy of their thesis to the University Library.

What we learnt during the academic year 2017/18

The experience of depositing theses during the pilot had highlighted some issues that needed addressing. We had to make decisions on how to deal with third party copyright, sensitive material, library copy and supply rules, and the alignment of access levels for hardbound and electronic theses. In response to this, we decided that we should think through each of the different ways in which a thesis could be deposited in the repository, and consider the range of contentious material that could be contained within a thesis.

How do theses enter the repository?

Whilst students that are depositing in order to graduate do this directly, we also have the capacity to scan theses on request here in the library, and these scanned theses are subsequently deposited in Apollo. In addition to this, we led a drive to digitise University of Cambridge theses held by the British Library on microfilm and gave alumni the option to digitise their thesis and make it open access at no cost to them.

British Library theses

This year the OSC has made a bulk deposit of theses scanned by the British Library, which significantly augments the number of theses stored in the repository. In the culmination of a two-year project, nearly 1300 additional Cambridge PhD theses are now available on request in the Apollo repository.

Prior to being made available in the repository, these Cambridge theses were held on microfilm at the British Library. They date from the 1960s through to 2008, when digitisation took over from microfilm as a means of document storage. The British Library holds 14,000 Cambridge PhD theses on microfilm; in 2016 they embarked on a project with the OSC to digitise ten percent of the collection at low cost – read more about this in an earlier post, Choosing from a cornucopia: a digitisation project.

You can explore the collection in Apollo: Historical Digital Theses: British Library collection.  The theses are under controlled access, which means they are available on request for non-commercial research purposes, subject to a £15 admin fee.

Establishing access levels

We established that the level of access we could allow to the thesis could be determined by the route a thesis entered the repository, its content, or in some cases the author’s wish to publish. To address all of the potential issues, we decided to define a set of access levels which would determine what we, as managers of the repository, were able to do with a thesis and the way in which it could be accessed by a requestor.

The access levels were put in action in spring 2018 and this was followed by a survey of Degree Committees, conducted by the e-theses working group consisting of members of the University Library and Student Registry. The survey asked for feedback on the suitability of the access levels for research outputs for all departments in the University; the outcome confirmed that the access levels were working and covered the options well, although a few tweaks were needed. In light of the feedback, a set of recommendations was put to the Board of Graduate Studies by the e-theses working group, and these recommendations were considered and accepted at their meeting on 3 July 2018, ready to be put in place for the 2017/18 academic year.

eSales for theses under controlled access

At the same time as we were establishing our access levels, we were also working on devising an eSales process to facilitate the supply of theses under controlled access. Controlled access replicates the way that historical, hardbound theses were managed in the library, with the addition of an electronic version of the thesis being held in the repository, and follows the library copy and supply rules for unpublished works under copyright law. A thesis scanned by the library would be deposited under controlled access so it remains unpublished, but this access level is also available to students depositing their thesis directly. The eSales process we devised went live in July 2018 and this meant a large number of theses held in the repository were made more accessible, including those digitised by the British Library. As of 18 October, we have supplied 14 theses via the eSales route and the requests keep coming in at a steady pace.

Looking forward to the 2018/19 academic year

As we begin the 2018/19 academic year, our theses management is looking in good shape but we will continue to improve and refine our internal and external services. In consultation with the University’s Student Registry we are making the final changes to our deposit forms, access levels and communications and we endeavour to make this academic year the smoothest yet for e-theses management. University of Cambridge theses are more accessible than they have ever been. The collection will grow as more students deposit each year, and the valuable research of PhD students will continue to be disseminated.

Published 25 October 2018
Written by Zoë Walker-Fagg
Creative Commons License

How open is Cambridge? 2017 edition

Welcome to Open Access Week 2017. The Office of Scholarly Communication at Cambridge is celebrating with a series of blog posts, announcements and events. In today’s blog post we revisit the question about the openness of Cambridge. 

For Open Access week last year I looked at how open Cambridge was using the extremely useful Lantern tool, developed by Cottage Labs, and which is the basis of the Wellcome Trust’s compliance tool. If you haven’t used it before, Lantern takes a list of DOIs, PMIDs, or PMCIDs and runs these through a variety of sources to try and determine the Open Access status of the publication. I found that, for publications in 2015, 51.8% of all of Cambridge’s research publications were available in at least one ‘Open Access’ source. How did Cambridge’s 2016 publications fair? Read on to find out.

Using the same method as last year, I first obtained a list of DOIs from Web of Science (n=9416) and Scopus (n=9124) for articles, proceedings papers and reviews published in 2016. Combining and deduplicating these lists returned 10,674 unique DOIs (~29 publications/day). I also refreshed the 2015 publication data using the latest Web of Science and Scopus information, which returned 10,090 unique DOIs. Year-on-year, this represents a 5.8% increase in the total number of publications attributable to Cambridge – more than inflation!

The deduplicated DOI lists for 2015 and 2016 (20,764 DOIs in total) were fed into Lantern and analysed in combination with information from Web of Science and the University’s institutional repository Apollo.

Figure 1. Distribution of papers, published in 2015 and 2016 which have a DOI, according to the Open Access sources they can be found in. 57.5% of 2016’s articles appear in at least one Open Access source, which represents a 4% increase over 2015. One third of all papers published in 2016 are available in Apollo.

Very pleasingly the percentage of publications available in at least one Open Access source increased to 57.5% in 2016 compared to only 53.4% for 2015 publications. Given that the total number of publications also increased during this period this result is doubly exciting. In raw numbers, this means that while 5384 publications were Open Access in 2015, an impressive 6135 publications were made Open Access in 2016.

Most of this increase can be attributed to the much larger share of publications that appear in Apollo, which is now the largest source of Open Access material for the University of Cambridge. An additional 822 publications were deposited in Apollo in 2016 compared to 2015, which is a 30% increase in one year alone.

You can now find more of the University’s research outputs in Apollo than in any other Open Access source. And because we operate an extremely popular Request a Copy service, potentially all of the publications held in Apollo, even those that are restricted and under embargo, are available to anyone in the world. You just need to ask.

Published 23 October 2017
Written by Dr Arthur Smith
Creative Commons License

Milestone -1000 datasets in Cambridge’s repository

Last week, Cambridge celebrated a huge milestone – the deposit of the 1000th dataset to our repository Apollo since the launch of the Research Data Facility in early 2015. This is the culmination of a huge amount of work by the team in the Office of Scholarly Communication, in terms of developing systems, workflows, policies and through an extensive advocacy campaign. The Research Data team have run 118 events over the past couple of years and published 39 blogs.

In the past 12 months alone there have been 26000 downloads of the data in Apollo. In some cases the dataset has been downloaded many times – 170 – and the data has featured in news, blogs and Twitter.

An event was held at Cambridge University Library last week to celebrate this milestone.


Opening remarks

The Director of Library Services, Dr Jess Gardner opened proceedings with a speech where she noted “the Research Data Services and all who sail in her are at the core of our mission in our research library”.

Dr Gardner referred to the library’s long and proud history of collecting and managing research data that “began on vellum, paper, stone and bone”. The research data of luminaries such as Isaac Newton and Charles Darwin was on paper and, she noted “we have preserved that with great care and share it openly on line through our digital library.”

Turning to the future, Dr Gardner observed: “But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical is a core part of what we now do.”

“In the 21st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible,” she noted.  “It is about sharing freely wherever it is appropriate to do so”. [Dr Gardner’s speech is in full at the end of this post.]

Perspectives from a researcher

The second speaker was Zoe Adams, a PhD student at Cambridge who talked about the work she has done with Professor Simon Deakin on the Labour Regulation Index in association with the Centre for Business Research.

Ms Adams noted it was only in retrospect she could “appreciate the benefit of working in a collaborative project and open research generally”. She discussed how helpful it had been as an early career researcher to be “associated with something that was freely available”. She observed that few of her peers had many citations, and the reason she did was because “the dataset is online, people use the data, they cite the data, and cite me”.

Working openly has also improved the way she works, she explained, saying “It has given me a new perspective on what research should be about. …  It gives me a sense that people are relying on this data to be accurate and that does change the way you approach it.”

View from the team

The final speaker was Dr Lauren Cadwallader, Joint Deputy Head of the OSC with responsibility for the Research Data Facility, who discussed the “showcase dataset of the data that we can produce in the OSC” which is  taken from usage of our Request a Copy service.

Dr Cadwallader noted there has been an increase in the requests for theses over time. “This is a really exciting observation because the Board of Graduate studies have agreed that all students should deposit a digital copy of their thesis in our repository,” she said. “So it is really nice evidence that we can show our PhD students that by putting a copy in the repository people can read it and people do want to read theses in our repository.”

One observation was that several of the theses that were requested were written 60 years ago, so the repository is sharing older research as well. The topics of these theses covered algebra, Yorkshire evangelists and one of the oldest requested theses was written in 1927 about the Falkland Islands. “So there is a longevity in research and we have a duty to provide access to that research, ” she said.

Thanks go to…

The dataset itself is one created by the OSC team looking at the usage of our Request a Copy service. The analysis undertaken by Peter Sutton Long and we recently published a blog post about the findings.

The music played at the event was complied by Tony Malone and covers almost 1000 years of music, from Laura Cannell’s reworking of Hildegard of Bingen, to Jane Weaver’s Modern Cosmology. There are acknowledgments to Apollo, and Cambridge too. The soundtrack is available for those interested in listening.

This achievement is entirely due to the incredible work of the team in the Research Data Facility and their ability to engage with colleagues across the institution, the nation and the world. In particular the vision and dedication of Dr Marta Teperek cannot be understated.

In the words of Dr Gardner: “They have made our mission different, they have made our mission better, through the work they have achieved and the commitment they have.”

The event was supported by the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.



Published 21 September 2017
Written by Dr Danny Kingsley
Creative Commons License

Speech by Dr Jess Gardner

First let us begin with some headline numbers. One thousand datasets. This is hugely significant and a very high level when looking at research repositories around the country. There is every reason to be proud of that achievement and what it means for open research.

There have been 26000 downloads of that data in the past 12 months alone – that is about use and reuse of our research data and is changing the face of how we do research. Some of these datasets have been downloaded 117 times and used in news, blogs and Twitter. The Research Data team have written 39 blogs about research data and have run 118 events, most of these have been with researchers.

While the headline numbers give us a sense of volume, perhaps let’s talk about the underlying rationale and philosophy behind this, which is core.

Cambridge University Library has a 600 year old history we are very proud of. In that time we have had an abiding responsibility to collect, care for and make available for use and reuse, information and research objects that form part of the intrinsic international scholarly record of which Cambridge has been such a strong part. And the ability for those ideas to inspire new ideas. The collection began on vellum, paper, and stone and bone.

And today much of that of course is digital. You can’t see that in the same way you can see the manuscripts and collections. It is sometimes hard to grasp when we are in this grand old dame of a building that I dare you not to love. It is home to the physical papers of such greats as Isaac Newton and Charles Darwin. Their research data was on paper and we have preserved that with great care and share it openly on line through our digital library. But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical is a core part of what we now do.

And the people in this room have changed that. They have made our mission different, they have made our mission better through the work they have achieved and the commitment they have.

Philosophically this is very natural extension of what we have done in the Library and the open library and its great research community for which this very building is designed. Some of you may know there is a philosophy behind this building and the famous ‘open library Cambridge’. In the 19th century and 20th century that was mostly about our open stack of books and we have quite a few of them, we are a little weighed down by them.

Our research data weighs less but it is just as significant and in the 21st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible. It is about sharing freely wherever it is appropriate to do so and there are many reasons why data isn’t open sometimes, and that is fine. What we are looking for is managing so we can make those choices appropriately, just as we have with the archive for many, many years.

So whilst as there is a fantastic achievement to mark tonight with those 1000 datasets it really is significant, we are really celebrating a deeper milestone with our research partners, our data champions, our colleagues in the research office and in the libraries across Cambridge, and that is about the changing role in research support and library research support in the digital age, and I think that is something we should be very proud of in terms of what we have achieved at Cambridge. I certainly am.

I am relatively new here at Cambridge. One of the things that was said to me when I was first appointed to the job was how lucky I was to be working at this University but also with the Office of Scholarly Communication in particular and that has proved to be absolutely true. I like to take this opportunity to note that achievement of 1000 datasets and to state very publicly that the Research Data Services and all who sail in her are at the core of our mission in our research library. But also to thank you and the teams involved for your superb achievements. It really is something to be very proud of and I thank you.