Typed and Scraped into Arcturus
Here’s an excellent example of the issues in Open Data. A simple, important, question from David Jones (who is involved in climate research infrastructure). It’s in response to my last post on Open data in Climate Research and it’s an excellent tutorial on the issues. And the result is very depressing.
David Jones says:
Perhaps you could kick off with what data you would like to see open.
Data that I would like, that you might have a professional opinion on, is a reference library for the IR spectra of the Kyoto Protocol gasses (CO2 and other greenhouse gasses). I had a look, but I couldn’t find an open archive of IR spectra. Do you know if one exists?
To remind readers – infrared absorption is the reason than greenhouse gases heat up the planet – they absorb infrared radiation and turn it into heat. The heat is trapped in the atmosphere. CO2 is an important greenhouse gas so it’s infrared absorption is a key piece of data. Ideally we need physical and chemical properties for all atmospheric components.
I can probably find the spectrum in an undergraduate textbook. If I copied it I would be sued by the publisher and burn in copyright hell. Yes, it’s factual data, and yes it’s important for the future of the planet and yes the publisher simply copied it from the author but copyright is the supreme god and we must worship it. So simply copying known public information from copyright-holders is a legal no-no.
I go to the web, Google for “collections of infrared spectra” and get:
(I have copied this without permission. Wiley is an aggressive publisher who pursued a graduate student, Shelley Batts, for critical blogging a single graph from one of “their” papers. They said it was a mistake and everything was now OK. It’s OK in that copyright still rests completely with Wiley.)
Anyway, we digress. This shows that a SINGLE BOOK of spectra for a SINGLE USER can cost 3000 Euros (that’s about 4000 USD). That shows the scale of problem we face in chemistry. Now I agree that these spectra were won with the sweat-of-the-brow and so on but in these days of automatic machines it does not cost 2 USD to publish a copy of a spectrum. This is an example of monopoly and scarcity control and inflated prices. (It may well be that Hummel does something laudable with the money – I have no idea).
The message is not only that the data are not Open they are enormously expensive.
Let’s try another: http://www.spectraonline.com/
First read the conditions (I have highlighted parts):
Use of Site. Thermo Fisher authorizes you to view, print and download the materials at this Web site (“Site”) only for your personal, non-commercial use, provided that you retain all copyright and other proprietary notices contained in the original materials on any copies of the materials downloaded or printed from the Site. You may not modify the materials at this Site in any way or reproduce or publicly display, perform, or distribute or otherwise use them for any public or commercial purpose. For purposes of these Terms, any use of these materials on any other Web site or networked computer environment for any purpose is prohibited. The materials at this Site are copyrighted and any unauthorized use of any materials at this Site may violate copyright, trademark, and other laws. You agree that you will not disclose, republish, reproduce, or distribute any of the information displayed on or comprising this Site (the “Content”) or make any use of the Content that would allow a third party to have access to the Content. If you breach any of these Terms, your authorization to use this Site automatically terminates and you must immediately destroy any downloaded or printed materials.
Not exactly cuddly. Where’s the data? They say:
The Spectra Online database is a collection of public domain and other data generously contributed from various sources. Please note that Thermo Electron Corporation does not control the reliability or quality of contributions to the Spectra Online database and therefore makes no guarantees or warranties on the usefulness or correctness of the information or data contained therein. Below are links to descriptions of current Spectra Online data collections:
Acorn NMR NUTS DB Searchable Archive
American Academy of Forensic Sciences (AAFS) MSDC Database Agilent MS of VOC’s Library
Boeing Aerospace FT-IR of Lubricants
Caltech Mineral Spectroscopy Server
CCRC Database – GC-EIMS of Partially Methylated Alditol Acetates
EPA Vapor Phase FTIR Library
EPA-AECD Gas Phase FTIR Database of HAPs
FBI FT-IR Fibers Library (Spectrochimica Acta)
David Hopkins NIR Collection
InPhotonics Raman Forensics Library
IUCr CPD Quantitative Phase XRD Round Robin Test Set
Jobin Yvon Raman Spectra of Polymers
LabSphere FT-IR and NIR Spectral Reflectance of Materials
McCreery Raman Library
NIST Chemistry WebBook
Notre Dame Organics Workbook Spectra
Edward Orton FTIR of Solid Phase Synthesis Resins
OMLC – PhotchemCAD Spectra
Pacific Lutheran University – NMR Spectra for Solomons and Fryhle Organic Chemistry, 7th Ed.
Pacific Lutheran University – FTNMR FID Archive
PhotoMetrics Inc. FT-IR Library
RMIT Applied Chemistry MS Library
SPECARB Raman Spectra of Carbohydrates
David Sullivan FT-IR Collection (University of Texas)
TIAFT User Contributed Collection of EI Mass Spectra
UCL Raman Spectroscopic Library of Natural and Synthetic Pigments
Univ. of Northern Colorado – Protein Infrared Database
University of S.C-Aiken UV-Vis of Dyes
USDA Instrumentation Research Lab NIR Library
U.S.G.S. Spectral Library of Minerals
University of the West Indies, Mona JCAMP Archive
Widener University – Dr. Van Bramer’s Spectral Archive
So what we have here is theft from the public domain. A variety of public sources have donated data to Thermo which has stamped them all with such a restrictive contract that I cannot even show you one spectrum. It is extraordinarily easy to steal from the public domain. Just wrap it in frightening legal stuff.
Now you could argue that actually I can take data from this site as it was originally public domain. But not all of it is. And if I am a robot I have no way of deciding which. I read the terrifying legal conditions and my system stackdumps.
This pollution and theft is endemic. We have to Open the Data.
Let’s try a US government site – NIST. It has an excellent set of chemical data – probably the best in the world. Here’s its excellent Webbook with a spectrum (http://webbook.nist.gov/cgi/cbook.cgi?ID=C124389&Units=SI&Type=IR-SPEC&Index=1#IR-SPEC )
And NIST is a US Government organization so all its works are ipso facto in the Public Domain, right? And so we can publish an Open Collection of Spectra by copying from NIST?
Standard Reference Databases are copyrighted by the U.S. Secretary of Commerce on behalf of the United States of America. All rights reserved. No part of our database may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior permission.
Source: Public Law 90-396, July 11, 1968, The Standard Reference Data Act
Purpose: To provide for the collection, compilation, critical evaluation, publication and sale of standard reference data
Section 6 – …the Secretary may secure copyright and renewal thereof on behalf of the United States as author or proprietor in all or any part of any standard reference data which he prepares or makes available under this Act, and may authorize the reproduction and publication thereof by others.
[The US Gov. Has made an exception for NIST so it can collect money.] So I shall probably go to Guantanamo for publishing the spectrum. I’ll take the risk, but I clearly cannot copy the whole lot.
So, very simply, although some 20 million chemical compounds are known there are no collections of Open infrared spectra.
As a responsible member of the Open Knowledge Foundation I am not prepared to appropriate material that has been “copyrighted” by others. So my conclusion is:
IT IS NOT POSSIBLE TO FIND AN INFRARED SPECTRUM OF CARBON DIOXIDE – A CRITICAL GREENHOUSE GAS – WITHOUT POTENTIALLY VIOLATING COPYRIGHT.
I hope this statement is wrong.