Should Open Source code create Open Data?

An important discussion on Ben Brumfield’s blog Open Source vs. Open Access:

I’ve reached a point in my development project at which I’d like to go ahead and release FromThePage [a genealogy program] as Open Source. There are now only two things holding me back. I’d really like to find a project willing to work together with me to fix any deployment problems, rather than posting my source code on GitHub and leaving users to fend for themselves. The other problem is a more serious issue that highlights what I think is a conflict between Open Access and Open Source Software.

[good discussion of BBB omitted] until…

The freedom to run the program, for any purpose.

6. No Discrimination Against Fields of Endeavor

Traditionally this has not been a problem for non-commercial software developers like me. Once you decide not to charge for the editor, game, or compiler you’ve written, who cares how it’s used?

However, if your motivation in writing software is to encourage people to share their data, as mine certainly is, then restrictions on use start to sound pretty attractive. I’d love for someone to run FromThePage as a commercial service, hosting the software and guiding users through posting their manuscripts online. It’s a valuable service, and is worth paying for. However, I want the resulting transcriptions to be freely accessible on the web, so that we all get to read the documents that have been sitting in the basements and file folders of family archivists around the world.

My quandry is this: none of the existing Free or Open Source licenses allow me to require that FromThePage be used in conformance with Open Access. Obviously, that’s because adding such a restriction — requiring users of FromThePage not to charge for people reading the documents hosted on or produced through the software — violates the basic principles of Free Software and Open Source. So where do I find such a license?

Have other Open Access developers run into such a problem? Should I hire a lawyer to write me a sui generis license for FromThePage? Or should I just get over the fear that someone, somewhere will be making money off my software by charging people to read the documents I want them to share?

PMR: The comments are also useful and generally urge Ben to be brave and Open Source his program. I’ve suffered the save concerns myself and looked for ways to protect against use I didn’t like (including applying a curse to the code which is a great deterrent when it works). However I add some more suggestions here and as this is the first time I have aired them look forward to comments:

  • Community norms. Ben should specify in clear text what his wishes for the code and its output should be. This has no legal force but the effect within an Open community can be significant. If others in the genealogy field share his views (or if he can find additional suggestions or practices already) that helps the community to converge on a generally accepted set of norms.

  • Open Data tags. Modify the software so that it outputs the OKF’s open data buttons ( by default in the document. This is easy to do it’s simply a hyperlink. Make this the default when the program is run and add a runtime switch such as -nookfbutton. This allows a user to remove the button but it is a conscious act rather like clicking a shareware program. The button will advertize the value of OKF buttons such as

  • graphics1Note that this requires the data to be Open and re-usable without hindrance and this may not be what you want if you wish to restrict commercial use. However I would urge against this NC is problematic difficult to define and difficult to enforce.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Should Open Source code create Open Data?

  1. Pingback: Unilever Centre for Molecular Informatics, Cambridge - Should Open … | Open Hacking

  2. I really do think that this gets me something like 99% of the way there.
    Fundamentally, the big closed-access boogie-man I’m worried about can, if they like, just build their own version of my app themselves. There are no patents, and it’s really not rocket science to put a few concepts (crowdsourcing, wikilinking, image-based transcription) together in this problem space. That means there’s not much downside to being “brave”.
    Hard-coding either a CC license or a open data tag onto the presentation screens seems like an effective way to both cultivate community norms and promote open access to the data. Those that wish may modify the templates to remove them, which is not a lot of work for them, but reinforces the act of departure from my intent.

  3. Jim Downing says:

    Hi Ben,
    I really disagree with the concept of hard coded Open Data licenses / tags. Make them the default configuration by all means, but hard coded they’re an anti-feature that will irritate valuable developers, whilst providing no more than a chocolate fireguard against the (ab)use you’re actually worried about.
    Let’s say Bob usually works on public data and makes valuable contributions to your OS project. Now let’s say that 5% of Bob’s work involves data that needs to be secret within his department, published on an intranet. Being a true Open Data advocate he knows the tags / licenses can’t appear on this data, as it’s an invitation to redistribute. I suspect Bob will be pretty narked at having to patch the code to prevent them being created, or at hacking them out afterwards. What will you do if Bob submits a patch to make the addition of the tags configurable? If you refuse to apply the patch, how will you react if he publishes a little open source utility to remove the tags automatically?
    There’s a broader issue at stake here, which concerns the balkanisation of the science commons community. This is going to continue unless we establish norms based on tolerance for “non-ideal” approaches rather than on norms based on attempting to enforce people to tread the one true path.

Leave a Reply

Your email address will not be published. Required fields are marked *