BioIT 2009 – What is data? -1

This is a post probably the last – in a series outlining Open Semantic Data in Science at BioIT Boston see (BioIT in Boston: What is Open? ).

I’ve explained Open and Semantic. Now I’ll attempt data. I thought this would be simple it’s not.

Why use the word data – at all? I think the reasons are pragmatic. There’s often an over-presented hierarchy:

Data Information Knowledge Wisdom

(the last influenced by T.S.Eliot)

Different people would have cutoffs at different points on this hierarchy but I think the following are fairly common attributes of data:

  • it is distinct from most prose (although some prose would be better recast as data)

  • it is generally a component of a larger information or knowledge structure

  • facts and data are closely related

  • many data are potentially reproducible or unique observations, are not opinions (though different people may produce different data)

  • data, as facts, are not copyrightable.

  • Collections of data and annotated data (data + metadata) may have considerably enhanced value over the individual items.

  • Data can be processed by machine

Here are some statements which provide data:

and here are some which are not data

  • her work is well respected

  • we thank Dr. XYZZY for the crystals

  • we find this reaction very difficult to perform

What’s the point of making the distinction? From my point of view:

  • Data can and increasingly should be converted to semantic form.

  • Data are not copyrightable and should be free to the community

  • Linking Open Data is now possible and has stunning potential.

So my self-appointed mission is to carry this out in the domain that I at least partially understand: chemistry.

  • We already have the semantic framework (CML, ChemAxiom)

  • we are managing to liberate data (Pubchem, Chemspider, CrystalEye, NMRShiftDB, CLARION, etc.)

  • when we have liberated enough we can start to provide Linked Open Data.

We aren’t there yet as there are very few fully Open Data in chemistry (CrystalEye may be the only one that asserts Openness through OKF). And unless there is something to link to we can’t do very much.

But we are moving fast. Four of us (Antony Williams, Alex Tropsha, Steve Heller and PMR) met for an hour yesterday to discuss what we’d like to do and how we might do it. We have complementary things to bring to the table, so watch for developments.

This Blog Post prepared with ICE 4.5.6 from USQ in Open Office

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to BioIT 2009 – What is data? -1

  1. Bryan says:

    Late on this: data, as facts, not copyrightable? What is the situation do you think with the outputs of simulations?

    • pm286 says:

      @bryan I don’t know the LEGAL situation – and it may depend on jurisdication – but I would absolutely insist that communitry norms required this to be non-copyrightable.

Leave a Reply

Your email address will not be published. Required fields are marked *