I missed an important OKF event on Monday – the identification of Linked Open Drug Data. Linked Open Data is one of the great emerging ideas of the modern Web – the idea that data is semantic, linked and open. There’s already a huge collection: http://richard.cyganiak.de/2007/10/lod/imagemap.html and http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22.png
That’s our current Linked Open Data. It’s got music, government, bioscience…
But where’s the chemistry?
There isn’t much. That’s because most chemists don’t understand the value of Linked Open Data. The normal model is to licence and sell it. Unfortunately that stops it being linked and being Open. But a larger number of data sets are “freely visible” in some way. That’s a start, but it’s not Open. We think that many data providers, when shown the value of Open Data will change their approach and make the data fully Open.
Before starting on Open, let me clarify “Linked”. This means that each identifiable chunk of Information in the data set has a unique identifier (technically a URI). It’s good practice anyway for data sets to have uniqueIDs and if yours does, then you can usually create Links by prepending your domain name. If you don’t then you need to start creating UniqueIDs.
Back to Open. This means that anyone can use *and reuse* the data for any purpose without further permission. Simple to state. Simple to make clear – simply add a licence that guarantees Openness. Best choices are CC-BY, CC0 and PDDL. The following are NOT Open:
- Non-commercial licence (CC-NC). They may be useful for musicians, but they are a menace in science and academia. They look enticing but they are a rathole. Never use them. Persuade your colleagues to get rid of them.
- Logins (even if no money is required). These cannot be negotiated in the Semantic web of LOD
- Incomplete access to data. Many sites provide search facilities (what is entry A2341) for “free”. The problem is you cannot navigate the whole data set. So access-through-search is not Open. Moreover the owner of the site could renove the facility
- Restricted subsets. “You can use up to 12345 entries without a contract”. Not Open.
So a group of OKF Open scientists – anyone can join – on the open-science list (http://lists.okfn.org/mailman/listinfo/open-science ) has started to ask providers whether their data is Open. It’s run by Jenny Molloy and Egon Willighagen who have put great commitment into Openness. Jenney has helped to build the IsItOpen request tool – where we aske providers whether their data is Open – OKD-compliant. This is inspired by MySociety’s http://whatdotheyknow.com where FOI requests can be made and logged. So here we make requests to data providers, publishers, etc about the Openness of their data. (Note that this is not a legal request – providers can refuse to answer – but they then risk violating the community norms of being unresponsive to the needs of the community. We are sure that all responsible publishers will welcome the opportunity to clarify their approach to data – and do this ass public record.
Anyway 12 OKFers met in the Ether pad on Monday and here’s Jenny’s account:
We had a very productive hack session on Monday night regarding linked open drug data. You can see the full notes here:
http://okfnpad.org/sciencewg-loddhack-201103
In summary, we reviewed the openness of several LODD data sets in CKAN and identified those whose maintainers should be sent an Is It Open Data? request. We drafted letters to send to the World Health Organisation, Global Health Observatory and the maintainers of two datasets at the US National Library of Medicine:
http://okfnpad.org/sciencewg-who-letter
http://okfnpad.org/sciencewg-rxnorm-letter
http://okfnpad.org/sciencewg-nlm-letter
Before we send them via http://www.isitopendata.org/, it would be great to get more signatories from the group, so please add your name to the end of the generic letter on http://okfnpad.org/sciencewg-loddhack-201103 if you are happy to be included. Unfortunately, we didn’t remind all of the hack session participants to do this before they left, so if you helped on Monday then please do sign!
We will be sending the letters on Monday 14th March during a follow up session, of which more details are to follow.
If there is a group on CKAN, or a general topic area that you feel would be a good target for future sessions of this nature, then please let me know!
This s great. And you can be part of it