It’s rather gratifying when someone else reports our own work, nn this case Chemistry Central blog. They have picked up our 18-month project with Imperial and this substantial summary saves us the work of creating our own:
The findings of the SPECTRa project’s final report
The final report for of the JISC‘s SPECTRa project (Submission, Preservation and Exposure of Chemistry Teaching and Research Data) was recently published. The project was funded by JISC’s Digital Repositories Programme as a joint project between the libraries and chemistry departments of the University of Cambridge and Imperial College London, in collaboration with the eBank UK project. (It is worth noting that a member of our editorial advisory board, Dr Simon Coles of the University of Southampton, was an eBank Project representative for the SPECTRa study).
The aim… The broad aim of the eighteen month study, which ended in March, was to “address the provision of Open Access to primary chemical research data in molecular and related subjects through institutional repositories”. The project focussed on three areas of chemistry: synthetic organic chemistry, crystallography and computational chemistry.
Various chemistry data repositories have been launched in recent years, such as the University of Southampton’s eCrystals and Cambridge’s World Wide Molecular Matrix. However, in general “Chemistry as a discipline has been slower than the physical and biomedical sciences to adopt and exploit OA concepts in the handling of experimental data and research publications. Most of the data (analytical, spectral and even crystallographic) associated with peer-reviewed publications from chemistry departments are never communicated to the scientific community. In those limited instances where a publisher does provide a means of accessing primary data to supplement a published paper, the data may then be subject to the publisher’s IPR practices. In most cases the primary data are simply not published…“.
In light of this, the five key objectives of the SPECTRa study were established as being; to undertake surveys of communities in computational and organic chemistry; to and refine crystallography tools developed by eBank; to develop automated validated and indexing tools specific to computational and synthetic chemistry, and provide interactions with the OAI-compliant DSpace repository platform; to develop chemical metadata functionality based on Dublin Core; and, to disseminate and promote project outcomes to encourage widespread adoption.
The findings… Surveys of chemists at Imperial College and Cambridge University investigated their current use of computers and the Internet and identified specific data needs. The salient points to emerge from the feedback were: a lot of data is not stored electronically (e.g. lab books, paper copies of spectra); a complex list of data file formats (particularly proprietary binary formats) are being used; there is significant ignorance regarding digital repositories; there is a requirement for restricted access to deposited experimental data.
In addition, two interesting statistics to come out of the surveys were that “[o]ver half (52%) of the  respondents stated they were aware of digital repositories however, but only 9% of respondents are currently using one”, whilst, “[a]bout 50% of the data created by research chemists is still stored in non-digital formats.” Also to emerge from the survey results was the concept of a “golden moment” – a point at which the researcher best understands the process, possesses a comprehensive package of information to describe it, and is motivated to submit it to a data management process.
Based on interviews with key researchers, distributable software tool development using Open Source code was undertaken to facilitate deposition into a repository. The project has provided tools which allow for the preservation aspects of data reuse. All legacy chemical file formats are converted to the appropriate Chemical Markup Language scheme to enable automatic data validation, metadata creation and long-term preservation needs. Additional tools would however be required to add value to any large-scale data aggregates. The deposition process adopted the concept of an “embargo repository” allowing unpublished or commercially sensitive material, identified through metadata, to be retained in a closed access environment until the data owner approved it for release. The resultant repository architecture envisages a federated framework in which data will first be deposited into an intermediate departmental repository, before possibly later being pushed into a central OA repository.
The project’s main findings included the following: scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials; the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process; institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements; IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.
In conclusion, the report states that “[t]here is no universal ‘shrink-wrapped’ approach which works for every discipline. We have designed a toolkit to address problems which should be applicable in a generic fashion to other institutions with similar research interests… The organisational and technical architecture of institutional repositories may be improved by creating intermediate “departmental” repositories between the researcher and the central institutional repository. Such departmental repositories, designed to meet the needs of specific local communities of researchers, and in particular offering an embargo facility, may be more successful in establishing the degree of confidence, competence and trust that will persuade researchers to deposit data readily.”
The full report can be viewed here.
I’ll just add that Jim Downing continues to manage the SPECTRa software and we’d be happy to hear from chemistry departments who are interested in capturing and archiving their crystallographic, computational or spectroscopic data (rather than letting it decay). Of course it’s up to you to implement the system but the tools work and Jim is looking at very lightweight repositories (DSpace is not well desgined for scientific data – nor are any of the others. Watch Jim’s jackrabbit …)