Open Source and Open Data

In the last post I commented on some of the limitations of licences to ensure Open Data. This post  now compares it with Open Source. I am a campaigner for Open Data (see Wikipedia) and on the advisory board for the The Open Knowledge Foundation which has produced an Open Data definition (a metalicence). People who wish to make their data Openly available often use a licence (such as CC-BY) which is compatible with the OKFN definition. These licences serve many valuable purposes – they act as a formal statement of the general desires of the author(s) and they provide legal force in certain ways (e.g. protecting against third-party copyright, etc.). They are, however, blunt instruments.
This problem has been recognised for many years in Open source where there are many different licences why try to express not just the general freedoms but add additional freedoms or restrictions. One of the best known is the GNU General Public License (GPL) which – in simple terms – requires all derivative works (even if much larger) to carry the GPL. This has been described as “viral” – it infects every piece of software it is linked with. This was the original motivation and has been policed by the Free Software Foundation.

The GPL additionally states that a distributor may not impose “further restrictions on the rights granted by the GPL”. This forbids activities such as distributing of the software under a non-disclosure agreement or contract. Distributors under the GPL also grant a license for any of their patents practiced by the software, to practice those patents in GPL software.

PMR: I ran into these concerns when I wrote a Chemical Markup Language converter for Open Babel. OB is issued under a GPL licence, so any code added must automatically carry the GPL. My JUMBO program (from which I would convert the code for OB) is Open Source, but issued under the Artistic License. I chose the AL because it allowed some control over the use of the code, while still honouring the OS principles. In particular it states that if someone creates a derivative work independently they must release it under a different name. Because JUMBO is primarily designed to test conformance to the definition of CML I wanted to ensure that derivative versions (which might not conform to CML) were not called “JUMBO”.
In Open Babel I added a paragraph under the licence to say that if anyone edited the code they were required to make it clear to users that this was not necessarily conformant to CML. The FSF audited OB and objected to this statement. I therefore rewrote the “requirement” as a “request” and added it to the in-code documentation. (Because of major rewrites from the OB community it’s no longer in the latest release – I think all my code has been obsoleted – no bad thing!).
It’s clear that a licence only covers various aspects of re-use and redistribution. There are many legal derivative works that are unacceptable. If someone other than the primary author(s) introduces a bug in a derivative it confuses the community, lowers the apparent quality of the code and increases tensions. If someone writes a lightweight wrapper round a code (legally) and then claims “ownership” of the result, that can be unfair to the original author(s). If someone makes extravagant claims for software that the author(s) do not support, that can cause problems. And so on.
Some of these may be thoughtless and could be prevented by a clear indication of what is “reasonable” and “unreasonable” practice – a set of requests or policy. In practice this is probably the best way as if someone is using code unscrupulously (which happens rarely, but happens) a licence will not protect against this. Most people in the Open Source arena work by the gift economy and value the authors’ contributions. They would always try to contact the author before considering forking the project.
The same considerations and tensions apply to data. However many people are coming to Open Data from outside the practice of Open Source. They may have encountered Open Access, but this has little to say about the gift economy.  Open Access requires a sound business model and there is sufficient ad hoc evidence to show this is possible. Much Open Source is initially more ad hoc and dependent on a shared gift ethic. This ethic is not wholly altruistic and some would claim none, but whatever the basic of the ethic there is high consciousness of it in the community – OS contributions earn karma. I suspect this is less important for many authors of OA – they will do it because they believe their articles will be more widely read, more widely used or they have been mandated to do it.
Open Data is relatively new and I believe urgently requires an ethic. In some cases it will be mandated but in many cases it will have elements of the gift economy. If so this needs to be protected by an awareness in the community of the value of the gift and that it should not be deliberately or inadvertently mis-used. Some communities – e.g. bioscience – have several years’ experience of (effectively) Open Data and must have encountered some of these problems – misappropriation, analogies to passing off, corruption (of data), etc. It could be useful to have examples and solutions if any.

This entry was posted in open issues. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *