The Comprehensive Knowledge Archive Network (CKAN) – Open Knowledge Foundation

Rufus Pollock is a tireless campaigner for Openness. He is a graduate student at Cambridge – “writing up”, but still with enormous energy for other activities in the area of Openness. He is a highly competent hacker – and promotes hackerdom locally with meetings in pubs and cafes – which is why the CKAN (below) has echoes of TeX, Perl, Sourceforge, etc. He has set up:

The Open Knowledge Foundation exists [to promote] the openness of knowledge in all its forms, in the belief that freer access to information will have far-reaching social and commercial benefits. In particular, we

  • Promote the idea of open knowledge, for example by running a series of forums.

  • Instigate and support projects related to the creation and distribution of open knowledge.

  • Campaign against restrictions, both legal and non-legal, on open knowledge. See the Open Knowledge Trail to learn more

He did me the honour of inviting me to be on the advisory board – I have done little, except that my main contribution has been to act as a foil for his debate. He has now announced:

After a year of (off and on) development we are delighted today to announce the official launch of the Comprehensive Knowledge Archive Network (CKAN for short):
CKAN is a registry of open knowledge packages and projects — be that a set of Shakespeare’s works, a global population density database, the voting records of MPs, or 30 years of US patents.
CKAN is the place to search for open knowledge resources as well as register your own. Those familiar with freshmeat (a registry of open source software), CPAN (Perl) or PyPI (python package index) can think of CKAN as providing an analogous service for open knowledge.
CKAN is a key part of our long-term roadmap and completes our work on the first layer of open knowledge tools:

CKAN links in especially closely with our recent discussions of componentization: we envision a future in which open knowledge is provided in a much more componentized form (packages) so as to facilitate greater reuse and recombination similar to what occurs with software today (see the recent XTech presentation for more details). For this to occur we need to make it much easier for people to share, find, download, and ‘plug into’ the open knowledge packages that are produced. An essential first step in achieving this is to have a metadata registry where people can register their work and where relevant metadata (both structured and unstructured) can be gradually added over time.
We also make no bones that fact what we have is present is very simple, certainly when compared to the long-term vision — after all, we should remember it has taken software over thirty years to reach its present level of sophistication. Thus, rather than attempting to pre-judge the solution to open knowledge componentisation question (for example in the choice of metadata attached to each package), this beta version is the simplest possible thing that will provide value, and we look to user feedback (and we include ourselves here as users) to determine the future direction of development of the system.


What kinds of things do you expect people to register in CKAN?

Anything and everything — when we say knowledge we mean any kind of content, data or information. That said there are two main recommendations regarding what you register:

  • First, we are looking for people to register ‘packages’ that is collections with some kind of structure rather than individual items. So a substantial set of photos, a datasets of all kinds, the writings of Shakespeare but not an individual blog, or your flickr photo collection (unless it is really big!).
  • Second, we’re looking for stuff that’s open: that is material that people are free to use, reuse and redistribute without restriction (other than, perhaps, a requirement to share-alike).

Why Not Just Use the Creative Commons Search Facility in Google/Yahoo/etc

Two main reasons:

  1. We focus on work that is open. Simply put the set of open work and the set of CC-licensed works are not identical because (a) not all Creative Commons licensed work is open (for example those which use the non-commercial provision are not) and (b) there are plenty of open works which do not use CC licenses (e.g. Wikipedia)
  2. The registry is designed to support holding much more metadata than simply whether the work is open on not. In particular we want to be able to support automated installation of knowledge packages in the future (which requires things like dependency and version information).

Is CKAN itself open?

Of course, both the code that CKAN runs on and the data itself is open, see the license page:

How Can I Get Involved

Start enter things into CKAN and editing existing entries — you don’t need to be the developer of a particular project or resource to enter it into the registry.
If you want to get more deeply involved join the okfn-discuss list and and introduce yourself or just drop an email to info [at] okfn [dot] org. If you want to just start hacking with the code see our development project page (then follow the links to subversion):

So how will this actually work out? My answer is that I have no idea and cannot have at this stage. It certainly shouldn’t be a “dumping ground” for unstructured information that will dilute and pollute the idea. My ideas (and I haven’t discussed them with Rufus – I’m meeting him at lunch) are that it should (at least initially) attract those types of knowledge objects that:
  • do not have a natural home elsewhere. (No point in repositing bioinformatics or astronomy data).
  • have good surface structure. It is important that visitors can immediately see what the objects are and how to navigate them
  • have an obvious virtue in being open – if objects were hidden or closed it would be a serious disadvantage to a community. The community need not be large, but it should have coherence.
  • Promote the idea of openness. “Gosh, I never new we could get information on MPs – perhaps we can also get information on…”
  • Have some sense of maintenance. Not dump and forget.
I’m not worried about discovery – the web searches of today (with their petatriple stores) will find things if they are labelled and exposed. (I’m a believer in lowercase semantic web – if Rufus tags the Shakespeare with “shakespeare” and “open” and “okfn” the tagbots will find it. Good use of FOAF, revyu, DBpedia, etc. would enhance many entries.)
Will I put my scientific data and articles in CKAN? Probably not. That’s not because they aren’t Open (except for those with Closed publishers) but because they are catered for by Intitutional Repositories. Science also IMHO needs domain repositories and I have been advocating this. My source code goes in Sourceforge.
What would I put? Probably well structured information on my locality – won’t say more here. The idea would be to catalyse others to do the same.
Although in some areas (e.g. Shakespeare) CKAN can aspire to be comprehensive in others it may hold exemplars (“proof of concept”) which are ready for scale-up. Whether that scale-up takes place in CKAN or seeds YAOS (yet another Open site) none of us can tell.
[And please WordPress preserve Rufus’ material unlike what you did to my last post.]
This entry was posted in "virtual communities", open issues. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *