petermr's blog

A Scientist and the Web

 

Open Access – why we need Open Bibliography

Stevan Harnad has commented on the discussion on publishing Open Access:

  1. December 20, 2010 at 11:15 am  (Edit)

    Why not just publish in your preferred journal and self-archive the peer-reviewed final draft (“Green OA”)?

For those who don’t know Stevan is one of the pioneers of OA and has been tireless in taking the struggle forward. We agree on many things – the need for Openness of scholarly information and the free (carefully chosen word) access to it. We disagree on details and strategy of achieving the aims.

The Green Road to Open Access should now – I hope – be labeled as “gratis” – “free as in beer”. It’s useful, but I don’t think it’s useful enough in science and I’ll explain why.

But first I’ll commend the Open Access movement on finally coming round to using the terms “gratis” and “libre” (“free as in speech”). For many years the OA movement did not describe how Open Access documents could be used. Obviously if a document is visible on the web a human can read it – while it is mounted – but there is no guarantee of re-use. For example I may violate copyright restrictions if I want to use a diagram in a gratis OA document. This is true whether it’s in a repository or on a personal web page. Moreover repositories are extremely bad (?lazy) at adding formal notices of rights to their contents and the default is simple: “you cannot re-use anything in this repository for any purpose unless explicitly allowed to do so”. That can only be done by adding a formal licence to the documents such as CC-BY or CC0 or PDDL. The Green Road philosophy which maintains that anything publicly visible on the web can be text-mined, reused copied etc. is counter to legal practice and is no defence against being pursued in the courts by the real or presumed copyright owner. We cannot build semantic certainty on legal quicksands. So, unless the author labels the self-archived copy as Libre I cannot afford to re-use it.

Even if the self-archived documents are libre, they are little use to data-driven science, which needs a systematic way of discovering them. Randomly archived documents are not systematically searchable, especially when the percentage of self-archiving is very low. Sometimes this is dictated by publishers who forbid self-archiving (guess which I’m talking about) but the very low level of compliance is the real problem. Almost all scientific publications in closed access publications are not self-archived. Stevan’s argument is that if we all make the right effort we’ll solve the problem – I simply don’t believe this will happen. Some institutions such as QUT and Soton mandate this – and get great reward for doing this, but most universities are incapable of the political effort (I’ll deal with this in later posts).

But let’s assume that everyone DID self-archive their publications. How do we discover them? The journals provide services for searching their own pages, but not surprisingly do not index the self-archived copies. Google, etc. may or may not do a comprehensive job in scraping the academic web but even so you can only use a few results of their search – Google does not provide useful APIs to everyone for free.

The solution is relatively simple to state and create technically. If we create an open Bibliography for scientific articles, then any self-archiving author can add their URLs to this bibliography with almost zero effort. The self-archival into any responsible repository could automatically index their depositions on the Open Bibliography. By searching the Open Bibliography then you discover all self-archived articles. If we are paying repository managers to support self-archiving then they should be providing an index to the reposited material. Everyone benefits – including a forward-looking publisher.

So we have to create an Open Bibliography.

We have the technology.

YOU have to provide the political will.

Leave a Reply