This is a post – probably the last – in a series outlining Open Semantic Data in Science at BioIT Boston – see (BioIT in Boston: What is Open? ).
I’ve explained “Open” and “Semantic”. Now I’ll attempt “data”. I thought this would be simple – it’s not.
Why use the word “data” – at all? I think the reasons are pragmatic. There’s often an over-presented hierarchy:
Data → Information → Knowledge → Wisdom
(the last influenced by T.S.Eliot)
Different people would have cutoffs at different points on this hierarchy but I think the following are fairly common attributes of data:
-
it is distinct from most prose (although some prose would be better recast as data)
-
it is generally a component of a larger information or knowledge structure
-
facts and data are closely related
-
many data are potentially reproducible or unique observations, are not opinions (though different people may produce different data)
-
data, as facts, are not copyrightable.
-
Collections of data and annotated data (data + metadata) may have considerably enhanced value over the individual items.
-
Data can be processed by machine
Here are some statements which provide data:
-
36 26 38
-
Melting Point: 300 K
-
The reaction product was red
-
my blog page is http://wwmm.ch.cam.ac.uk/blogs/murrayrust
and here are some which are not data
-
her work is well respected
-
we thank Dr. XYZZY for the crystals
-
we find this reaction very difficult to perform
What’s the point of making the distinction? From my point of view:
-
Data can and increasingly should be converted to semantic form.
-
Data are not copyrightable and should be free to the community
-
Linking Open Data is now possible and has stunning potential.
So my self-appointed mission is to carry this out in the domain that I – at least partially – understand: chemistry.
-
We already have the semantic framework (CML, ChemAxiom)
-
we are managing to liberate data (Pubchem, Chemspider, CrystalEye, NMRShiftDB, CLARION, etc.)
-
when we have liberated enough we can start to provide Linked Open Data.
We aren’t there yet as there are very few fully Open Data in chemistry (CrystalEye may be the only one that asserts Openness through OKF). And unless there is something to link to we can’t do very much.
But we are moving fast. Four of us (Antony Williams, Alex Tropsha, Steve Heller and PMR) met for an hour yesterday to discuss what we’d like to do and how we might do it. We have complementary things to bring to the table, so watch for developments.
This Blog Post prepared with ICE 4.5.6 from USQ in Open Office
Late on this: data, as facts, not copyrightable? What is the situation do you think with the outputs of simulations?
@bryan I don’t know the LEGAL situation – and it may depend on jurisdication – but I would absolutely insist that communitry norms required this to be non-copyrightable.