My colleague John Davies, who provides a crystallographic service for the deparment has estimated that the data for 80% of crystal structures (in any chemistry department) never leave the laboratory. They are locally archived, perhaps on CDROM, perhaps on a local or departmenta machine. With the passge of time – changes in staff, organisation, machines – information decays and it is likely that crystallographic data wil be systematically lost.
to research the development of digital repositories. Three groups have been collaborating in chemistry, with a strong emphasis on crystallography and spectroscopy. This involves all aspects – building software, designing metadata specs, and understanding the way chemists work and think. We have found that the social aspects are at least as important as the technical – I won’t eleborate here yet as these will be reported at:
Why is it important to archive the data? Isn’t normal academic publication (including theses) sufficient? Isn’t it very costly and a waste of money that could be spent on proper research?
Well, the crystallographic community has archived its data for many years and research on this data alone has given rise to hundreds or even thousands of papers datamining this resource. Without this chemistry would be very much poorer as we would have little in the way of molecular or crystal structure systematics.
So what is the cost of the unpublished data? To carry out the structures at commercial rates would be about USD 1500-5000 for the size of structures currently published. Let’s assume a laboratory does 500 structures a year and if we assume that full economic costs are half the commercial (this is just a guess) – we are looking at half a million dollars per year to do crystal structures in a chemistry department. (I suspect the numbers are on the low side – I’d be interested in comments).
Allowing that there has been some publication of some of the material as comments in chemical papers I suspect that the information from quite a high proportion of the structures is never published in any form. How easy is it to find information in current theses, especially if you don’t know it’s there?
I think I would be safe in saying that wordlwide hundreds of millions of dollars’ worth of crystallographic data is lost each year. For spectra and synthetic chemistry it will be at least 10 times greater. Many synthetic chemists say they are interested in failed reactions – and these are almost never published!
If funders are aware of this they should be concerned about the loss. Funders are increasingly being proactive in requiring funded research to be Openly accessible. The Wellcome Trust is among the stromgest proponents:
and a quote
The Trust provides additional funding to cover the
costs relating to article-processing charges levied by
publishers who support this model.
• Approximately 1% of the research grant budget
would cover costs of open access publishing