Comments on Open Data

It’s great to see the positive response to ideas in Open Data. One from Rich Apodaca (Blue Obelisk) and one from Bill. the really nice thing is that there is a feeling of communal development of ideas. No single person owns this concept – but there are not too many views to make it fuzzy. I reproduce much of this verbatim.

Posted by Rich Apodaca 4 hours ago

What happens to an article in an Open Access journal that shuts down? Recently, this question was raised on the Blue Obelisk mailing list about an article published in the Internet Journal of Chemistry (IJC). Because the lights now appear to be out for good at IJC, are its articles lost forever?The good news is that by retaining copyright, authors of Open Access articles have the right to copy or reprocess their work in any form they see fit. If a traditional subscription-based journal shuts down, the fate of its entire article collection is up to the publisher, who is in nearly all cases the sole copyright holder. It’s remarkable that self-respecting scientists would knowingly allow the fruits of their hard work to meet with such a fate. With Open Access, the author is in control of keeping their article publicly visible.The bad news is that keeping an article publicly visible is the last thing most scientists want to spend valuable time and energy on. After all, that’s what the journal was there for, wasn’t it? Given the technical barriers to self-archiving Open Access content, who could blame them? First, an author needs to find a server willing to host their content. After that comes learning the software to get the article onto the server. Then comes the need to decide on the archival format, being ever-mindful of the hamburger effect. Of course, authors would probably want some assurance that the location of this article won’t change and will be “permanently” available. Does a DOI need to be re-assigned? And let’s not forget about how the poor reader is supposed to find these articles (some would say that Google is the answer, but I would disagree). Expecting each author to solve these problems on his or her own simply won’t work. There must be a better way.
To my knowledge, there is no solution to the Open Access archiving problem. But if history is any guide, this is a huge opportunity that will soon disappear. Maybe a SourceForge-like repository for Open Access content would work. Perhaps something less structured would be enough. The profit motive would certainly come into play, as the successful solution to this problem would easily have thousands, if not tens of thousands, of regular users. Whatever form the solution might take, it would most likely be a simple system built by a small organization using off-the-shelf components. I would expect nothing less from a disruptive technology like Open Access.
As one or more solutions to the Open Access archival problem begin to gain traction, other opportunities may arise and be exploited by enterprising individuals and small organizations. And so on, until a thriving ecosystem becomes established.
Proponents have been debating the “how” of Open Access for some time now. Maybe it’s time to start thinking about what comes after the Open Access transition.

I’m still getting my thoughts into place after the DCC2006 meeting. I think there will be repositories for Open Data – we are hard at work building part of them under SPECTRa. Jim Downing and I were just discussing how to get data from the experiment into the public repositories so that everyone is happy. The chemist wants as little work as possible but as much exposure. We’ve got a nifty technical idea which should help this a lot.
At present O/DSSL (to use Bill Hooker’s list – Open/Data/Source/Standards/License) is a Cinderella compared with Open Access. But Open Access is complex and fragmented – everything from the single-click-to-self-archive to hybrid journals. Some Open Access schemes are helpful to the cause of Open Data (where the author retains copyright) but other systems cause too much FUD for Open Data. Open Data requires simple concepts “this publisher allows and encourages you to scrape whatever you like from this website for purpose of disseminating and reusing science. And we’ve even constructed a robot-friendly license so no-one has to ask permission…” That is my idea of Open Access (or at least one of them – I obviously include direct publication of notebooks…).
And now…
Bill Says:

  1. November 28th, 2006 at 6:24 pm eThanks for the link!Is this the meeting you mean? I just downloaded your slides; how I love the internets! I wonder whether Google would be interested in extending Google Scholar to Google Data? That way authors could at least self-archive data before there are even Data Repositories available… or could current IRs accommodate data?
    Come to think of it, I’ve been tagging/labeling folders “oa/os” for “open access/open science”, which is what I’ve been calling the whole field in my head. I wonder whether Open Access/Open Data would do for a “brand”? Given that Source, Licensing and Standards are enablers of Access and Data, OA/OD is really the essence of Open Science.

One problem that has come up with both Open Knowledge discussions and the DCC is that the more the field broadens, the harder to contain it. I am fairly clear that we should concentrate on Science – if our labours also liberate maps that’s useful but it’s not the prime pupose. So among the words that should go in are
Open Science Data (and I like Notebook)
Open Data IMO implies Open “access” to the data but it doesn’t require “Open Access”. Something like “Open Availability” – and the licence will do the rest – “if you can get it – and we’ll help you – you can have it. Open Access often says “you can get it (we need not help you find it) but you can’t have it”.

This entry was posted in open issues. Bookmark the permalink.

7 Responses to Comments on Open Data

  1. Bill says:

    I really like “if you can get it, and we’ll help you get it, you can have it”. I’d like to see that included in the plain-language version of future licenses.
    I also like Rich’s ueber-repository idea. I’m in the process of finding OA homes for all my publications, all of which are in journals that will allow me to archive at least something. So all I have to do is find a repository — easy, right? Well, not so much. Turns out there is an IR at the university where I did most of my work, but you have to be a current staff member to deposit. I found a current faculty member willing to help, so that should work out for me, but there is as far as I can tell no general repository for which my stuff, or the majority of biomed research, would be suitable. For instance, arXiv will take “quantitative biology”, but unless their definitions are pretty loose that hardly covers, for instance, most of molecular biology. What about chemistry — is there a general repository, not affiliated with a particular institution, available for chem papers?

  2. pm286 says:

    (1) Yes… I was influenced by Michael Kay’s contribution on XML-DEV where they are discussing whether one can GPL Schemas (i.e. specifications). Mike – who is the world guru of XSLT and the Saxon processor said he’d like to license software as
    “if you promise not to use me, I’ll promise not to sue you”
    It was gently deprecated by those who seem to thing the world cannot exist without lawyers.
    =====
    I take this point, as well. This is along the lines of what Rufus Pollock is doing with Knowledge Forge – a similar approach to Sourceforge, but for knowledge. See OKFN link. ut perhaps when the world is completely memex-ed then we’ll record everything we did all the time.

  3. Pingback: Pierres Service » Blog Archive » Comments on Open Data

  4. Pingback: Pierres Service » Blog Archive » Comments on Open Data

  5. Pingback: Pierres Service » Blog Archive » Comments on Open Data

  6. pm286 says:

    (4) I think this pingback is conflated with (3) and (5). Not that it’s not interesting…

  7. I just want to let you know that IJC is back alive and serving again (http://www.ijc.com/). It turns out that our campus reallocated IP addresses and that server was simply forgotten about. Oh well.
    Steve

Leave a Reply

Your email address will not be published. Required fields are marked *