I mentioned yesterday that we had been funded by JISC to develop a departmental repository starting from C3DER (crystallography) and expanding to spectroscopy and chemical syntheses. We shall be working with a commercial supplier of Electronic Lab Notebooks in a tightly coupled project where both will benefit from the synergy we shall have the use of a robust platform and can add on many of the innovations we’ve been developing here, which should then get a wider currency. We think this is a new and exciting way of exploring the next generation of chemical informatics which will be semantic, enhanced and guided by ontologies.

CLARION project Cambridge Chemistry Department


The data challenge: Chemistry laboratories produce many types of information and data raw data, processed data, observations, chemical structures, reaction schemes, experimental write-ups, conclusions, graphs, images, crystallographic, spectroscopy data, papers, references, and so on.  It is challenging to store this variety of information such that it is accessible and usable by a variety of users.  The challenges include:


Storing data in formats that allow its use by specialist data processing tools

Using data formats that are suitable for publication and long-term preservation

Allowing certain data to be used by people outside the department

Motivating researchers to open their data

Enhancing the meaning and context of the data to improve its usability

Making the data searchable and easily navigable

Ensuring that the system has minimal support overheads, yet continually evolves as required to meet changes in the IT environment.


Using an ELN:  The Cambridge Chemistry Department has a basic repository which stores crystallographic data.  Project CLARION (Cambridge Laboratory Repository In/Organic Notebooks) will create an enhanced repository that captures core types of chemistry data and ensures their access and preservation.  The Chemistry Department is implementing a commercial Electronic Laboratory Notebook (ELN) system; CLARION will work closely with the ELN team to create a system for ingesting chemistry data directly into the repository with minimum effort by the researcher.


Enhancing and expanding data usage:  CLARION will provide functionality to enable scientists to make selected data available as Open Data for use by people external to the department.  The project will use techniques for adding semantic definition to chemical data, including RDF (Resource Description Framework) and CML (Chemical Markup Language).  Much of these techniques will be extensible to other disciplines.  CLARION will address general issues such as ownership of data, and it will publicise its results to the chemistry and repositories communities.  Effort will be put into developing a sustainable business model for operating the repository that can be adopted by the department after project completion.


Timelines: The project runs for two years from April 2009. The initial pilot deployment of the ELN is scheduled for late 2009, and we hope to be publishing open data from it in early 2010.


    KineMatik are a ELN software solution provider to Academic, Government and Industry. We would be pleased to collaborate / partner on your CLARION PROJECT. If this is of interest to you and your team, please contact me on the email address provided. Best Regards, Tom

