My talk is “Open Semantic Data in Science”. I’ll probably write 3-4 blog posts on the various aspects of this, and at present I’m thinking of:
- What is Open? (this post)
- What is semantic? And what do we require for it?
- What is data?
- What are we able to offer (with some modest emphasis on our own endeavours).
I am starting with the assumption that for science now and in the future Open Data will be essential. The culture, especially among young people, is that “the answer is out there” and is retrievable within seconds or less. There’s also a realisation that increasingly we don’t know in detail what we are looking for when we start a study. We read bits of papers, skim around till we get a feel for the subject, ask our colleagues, post questions on blogs, etc.
- ANY barrier to access and re-use, however small and seemingly trivial COMPLETELY destroys public semantic data.
(Note that I accept that there are closed worlds – companies, healthcare, etc. which require access controls, but their technology can feed off what we are trying to create in public view).
- 1. Access
The work shall be available as a whole …, preferably downloading via the Internet without charge. The work must also be available in a convenient and modifiable form.
Comment: This can be summarized as ‘social’ openness – not only are you allowed to get the work but you can get it. ‘As a whole’ prevents the limitation of access by indirect means, for example by only allowing access to a few items of a database at a time.
- . Redistribution
The license shall not restrict any party from selling or giving away the work either on its own or as part of a package made from works from many different sources. The license shall not require a royalty or other fee for such sale or distribution.
- . Reuse
The license must allow for modifications and derivative works and must allow them to be distributed under the terms of the original work. The license may impose some form of attribution and integrity requirements: see principle 5 (Attribution) and principle 6 (Integrity) below.
Comment: Note that this clause does not prevent the use of ‘viral’ or share-alike licenses that require redistribution of modifications under the same terms as the original.
- . Absence of Technological Restriction
The work must be provided in such a form that there are no technological obstacles to the performance of the above activities. This can be achieved by the provision of the work in an open data format, …
- 5. Attribution
The license must not restrict anyone from making use of the work in a specific field of endeavor. For example, it may not restrict the work from being used in a business, or from being used for military research.
Comment: The major intention of this clause is to prohibit license traps that prevent open source from being used commercially. We want commercial users to join our community, not feel excluded from it.
9. Distribution of License
The rights attached to the work must apply to all to whom the work is redistributed without the need for execution of an additional license by those parties.
10. License Must Not Be Specific to a Package
11. License Must Not Restrict the Distribution of Other Works
and now the absolute requirement for Openness.
NONE OF THE ABOVE CONDITIONS ARE OPTIONAL
- This is the crux. There are many data resources which are described as “Open” but they fail in one or more aspects. The commonest failures are:
- to expose only part of the data. A database system with a query interface is normally not Open Data even if individual items can be downloaded without barrier. It is generally impossible to extract the whoel work as its boundaries are concealed by the search interface
- to limit the amount downloaded. This is very frequent (“you may use a maximum of 100 entries”).
- To forbid re-use. “This data is copyright X and may not be re-used without permission”)
- To require access through specific technology. A search form limits the access.
- To require any form of signin, even if free. Robots are illiterate in this aspect
- To restrict purpose of re-use. Thus CC-NC (“no commercial reuse”) is NOT OKF-compliant
- To fail to provide a clear statement that the data are open and comply with the Open Knowledge definition. It’s almost universal that data are NOT labelled as Open. This is easy to fix – just add the OKF’s tags
- So the message is simple, though it will take time to spread
- Use the OKF definition for all your data and tag it as such
This blog authored with ICE + Open Office; thanks to PeterSefton and USQ