Open APIs: fundamentals and the cases of KEGG and Wikipedia

It is now urgent and vital that we define what is an “Open API”. The phrase is widely used, usually without any indication of what it offers and what – if any restrictions – it imposes. This blog is a first pass – I don’t expect to get everything “right” and I hope we have comments that evolve towards something generally workable. Among other things we shall need:

  • An agreement that this matters and that we must strive for OKD-open
  • Tools to help us manage it
  • A period of constructive development in trying to create fully Open APIs and a realisation of the problems and costs involved

I shall also list some additional criteria that I think are important or critical

Firstly the word “Open” (capitalised as such) is intended to convey the letter and the spirit of the Open Definition:

“A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.”.

This is a necessary but not sufficient condition for an “Open API”. What is the “API” bit?

Its stands for “Application Programming Interface” (http://en.wikipedia.org/wiki/Application_programming_interface ). In the current context it means a place (usually a website) where specific pieces of data can be obtained on request. It is often called a “service” and hence comes under the Open Service Definition.

“A service is open if its source code is Free/Open Source Software and non-personal data is open as in the Open Knowledge Definition (OKD).”

This is necessary, but not sufficient for what we now need. The rationale for the addition of F/OSS software is explained in
http://www.opendefinition.org/software-service/

The Open Software Service Definition defines ‘open’ in relation to online (software) services.

An online service, also known under the title of Software as a Service (SaaS), is a service provided by a software application running online and making its facilities available to users over the Internet via an interface (be that HTML presented by a web-browser such as Firefox, via a web-API or by any other means).

PMR: generally agreed. This can cover databases, repositories and other services. I shall try to illustrate

With an online-service, in contrast to a traditional software application, users no longer need to ‘possess’ (own or license) the software to use it. Instead they can simply interact via a standard client (such as web-browser) and pay, where they do pay, for use of the ‘service’ rather than for ‘owning’ (or licensing) the application itself.

PMR: I don’t fully understand this. I think there has to be an option for gratis access, else how does the system qualify as Open. But we do have to consider costs

The Definition

An open software service is one:

  1. Whose data is open as defined by the Open Knowledge Definition with the exception that where the data is personal in nature the data need only be made available to the user (i.e. the owner of that account).
  2. Whose source code is:
    1. Free/Open Source Software (that is available under a license in the OSI or FSF approved list — see note 3).
    2. Made available to the users of the service.

I shall revisit “whose data” later and particularly the need to add a phrase such as “and is made available”

Notes

  1. The Open Knowledge Definition requires technological openness. Thus, for example, the data shouldn’t be restricted by technological means such as access control and should be available in an open format.

PMR: Agreed. It may also mean that you do not need to buy/licence proprietary tools to access the data. Is a PDF document Open? The software required to READ it is usually closed. An additional concern here is the use of DRM (Digital Rights management)

  1. The OKD also requires that data should be accessible in some machine automatable manner (e.g. through a standardized open API or via download from a standard specified location).

PMR: This is critical. I read this as “ALL the data”.

  1. The OSI approved list is available at: http://www.opensource.org/licenses/ and the FSF list is at: http://www.gnu.org/philosophy/license-list.html
  2. For an online-service simply using an F/OSS licence is insufficient since the fact that users only interact with the service and never obtain the software renders many traditional F/OSS licences inoperative. Hence the need for the second requirement that the source code is made publicly available.

PMR: Services almost always involve a hotch-potch of code at the server side (e.g. servlets, database wrappers, etc.) This can be a problem.

  1. APIs: all APIs associated with the service will be assumed to be open (that is their form may be copied freely by others). This would naturally follow from the fact that the code and data underlying any of the APIs are open.

PMR: This relates to documentation, I assume

  1. It is important that the service’s code need only be made available to its users so as not to impose excessive obligations on providers of open software services.

PMR I read this as “here’s the source code but we are not under any obligation to install it for you or to make it work”. I agree with this

As examples the OSD cites Google Maps: Not Open and Wikipedia: Open

  • Code: Mediawiki is currently F/OSS (and is made available)
  • Data: Content of Wikipedia is available under an ‘open’ licence.

One of the oft-quoted aspects of F/OSS is the “freedom to fork”. (http://en.wikipedia.org/wiki/Fork_%28software_development%29 , and http://lwn.net/Articles/282261/ ). Forking is often a “bad idea” but is the ultimate tool in preserving Openness. Because it means that if the original knowledge stops being Open (becomes closed, dies, is inoperable) then at least in theory someone can take the copy and continue the existence. I think this is fundamental for Open APIs.

The APIs must provide (implicitly or explicitly) the ability for someone to fork the software and content.

It doesn’t have to be easy and it doesn’t have to be cost-free. It just has to be *possible*.

The case of KEGG (Kyoto Encyclopedia of Genes and Genomes , http://www.genome.jp/kegg/ ) is a clear example of an Open service being closed (http://www.genome.jp/kegg/docs/plea.html ). IN brief the laboratory running the services used to make everything freely available (and implicitly Open) but now:

 

Starting on July 1, 2011 the KEGG FTP site for academic users will be transferred from GenomeNet at Kyoto University to NPO Bioinformatics Japan, and it will be available only to paid subscribers. The publicly funded portion, the medicus directory, will continue to be freely accessible at GenomeNet. The KEGG FTP site for commercial customers managed by Pathway Solutions will remain unchanged. The new FTP site is available for free trial until the end of June.

I would like to emphasize that the KEGG web services, including the KEGG API, will be unaffected by the new mechanism to be introduced on July 1, 2011. Our policy for the use of the KEGG web site will remain unchanged. The only change will be to FTP access. We have already introduced a “Download KGML” link for the KGML files that used to be available only by FTP, and will continue to improve the functionality of KEGG API. I would be very grateful if you could consider obtaining a KEGG FTP subscription as your contribution to the KEGG project.)

I am not passing any moral judgment – you cannot pay people with promises. But the point is that an Open Service has become closed. With the “right-to-fork” it is possible to “clone” all the Open material (possibly with FTP) before the closure date and maintain an Open version. This may or may not be cost-effective, but it’s possible.

So what is the KEGG API mentioned above and is it Open? Almost certainly not. It may be useful but it is clear that neither the software nor the complete contents of the database are available.

By contrast Wikipedia remains an Open API. It’s possible to clone enough of the software that matters and all of the content. Installing the software is probably non-trivial (yes, I can run Mediawiki but there are all sorts of other things, configuration files, quality bots, etc. And cloning the content means dumping a snapshot at a given time. But at least, if we care enough it is LEGALLY and technically possible.

In the next post I will examine some of our own resources and how close they are to “OKD and OSD-open”. We fall down on details but we succeed in motivation.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Open APIs: fundamentals and the cases of KEGG and Wikipedia

  1. Nick Barnes says:

    Well, I’ll keep using the term ‘API’ to mean what it meant 20 years ago, and ‘Open API’ to mean an open-source one of those. This will lead to me getting confused (for instance, when attending conference workshop sessions headlined as “API”), and feeling like an old fart, but that’s apparently the price of consistency.
    I imagine this traditional term is what “all APIs associated with the service will be assumed to be open (that is their form may be copied freely by others)” is alluding to. Not the documentation (as you suggest), but the names, types, and functionality of the objects (usually functions) exposed in the API. That might be described in documentation, or it might not. Either way, people can, and have, got into trouble for writing code which implements someone else’s API.
    Plucking an example from the air, and without knowing the actual policy and record of the company concerned: if I were to create and put online an open-source library for Linux with an API including functions with names like these http://www.mathworks.com/help/techdoc/apiref/bqoqnz0.html#bqoqoes-1 and with matching calling conventions and functionality, then I would fear a nastygram from MathWorks lawyers. In law, I may or may not have defences to such a nastygram (e.g. interoperability), but the fear of such a nastygram might well dissuade me from such an action in the first place. If the API is open (part of which might be, for example, that the C header files are Open Source) I need have no such fear.

  2. Todd Vision says:

    It might help to separately consider openness of the components of the API: the software that runs it, the data it serves, and access to the service. We already have an understanding of how to determine to what extent the software is open, and same for the data. I suggest we define what openness means for service access orthogonally to the other two. This allows us to see clearly what barriers are put it in the way of a user. For instance, one can imagine any person and any machine having free and unfettered access to make requests via a certain API, but access being restricted to underlying data, and access being restricted to the source code. Perhaps it wouldn’t be a terribly useful API, but it would at least be clear what about the system is open and what is closed.

Leave a Reply

Your email address will not be published. Required fields are marked *