Over the last few months a group of us has been working to create a set of principles [below] for asserting the Openness of Bibliographic data. The initiative was sparked and driven by Adrian Pohl (acka47, also http://www.uebertext.org/ ). Adrian took the ideas of the Panton Principles for Open Data in Science (http://pantonprinciples.org/ ) and edited them to apply to Bibliographic Data. Some bits worked, others didn’t. We (not surprisingly) all had different ideas about what Bibliographic Data was. Gradually more people have been brought into the discussion which has taken place in the OKFN mailing lists , Etherpads, Googledocs, Skype, etc.
It’s worth making it clear that effectively the main discussion has been Open – the Etherpads have recorded much of it. There have been no exclusions – and anyone who follows the lists has been able to join the Skype and Etherpads. The result is seven people (2 Librarians, A Mathematician, An Economist, a Computer Scientist, a Library developer and a chemist. ) working hard to create something they all feel happy with and believe you will.
We believe that the Principles are an important step forward for helping all those involved in managing Bibliography. Currently there is confusion about what Bibliography is, who has what rights and responsibilities, etc. There are many axes – libraries, authors, readers, publishers, arts/humanities, science, etc. and we believe that we have been able to cover all of these. We hope that everyone will be able to agree that Bibliography should be Open and that the principles show the advantages of formally making it so.
We shall be formally launching the principles on January 17th in Cambridge and on the Internet. In the PMR symposium (http://www-pmr.ch.cam.ac.uk/wiki/Visions_of_a_%28Semantic%29_Molecular_Future ) I shall spend time to introduce the Principles and the people involved. The programme is packed but we intend to have a Skype session sometime during 1630-1700 UTC when as many of the authorsd will be online. Adrian will give a short introduction.
If you are able we would love to see you at the symposium. If not we intend that the symposium is streamed (#pmrsymp and details on the web page) and recorded. There will be a twitterfall so that you can follow the comments.
We’ll get immediate feedback from you (on the Net) and delegates at the symposium. Comments on the Principles are welcome, but we don’t intend changes other than typos in the next week. We’d particularly like to know if you or your organization would be keen to add your support and we’ll see how the OKF could provide an e-page for this…
Principles on Open Bibliographic Data
Producers of bibliographic data such as libraries, publishers, universities, scholars or social reference management communities have an important role in supporting the advance of humanity’s knowledge. For society to reap the full benefits from bibliographic endeavours, it is imperative that bibliographic data be made open — that is available for anyone to use and re-use freely for any purpose.
To define the scope of the principles, in this first part the underlying concept of bibliographic data is explained.
Bibliographic data consists of bibliographic descriptions. A bibliographic description describes a bibliographic resource (article, monograph etc. – whether print or electronic) with the purpose of:
- identifying the described resource, i.e. pointing to a unique resource in the universe of all bibliographic resources and
- locating the described resource, i.e. indicating how/where to find the described resource.
Traditionally one description served both purposes at once by delivering information about:
author(s) and editor(s), titles, publisher, publication date and place, identification of parent work (e.g. a journal), page information.
In the web environment identification makes use of Uniform Resource Identifiers (URIs) like a URN, DOI etc. Locating an item is made possible through HTTP-URIs known as Uniform Resource Locators (URLs). All URIs for bibliographic resources thus fall under this narrow concept of bibliographic data.
A bibliographic description may include other information that falls under the concept of bibliographic data, such as non-web identifiers (ISBN, LCCN, OCLC etc), rights assertions, administrative data and more*; this data may be produced by libraries, publishers, scholars, online communities of book lovers, social reference management systems, and so on.
Furthermore, libraries and related institutions produce controlled vocabularies for the purpose of bibliographic description, such as name and subject authority files, classifications etc., which also fall under the concept of bibliographic data.
[See addendum for a list of secondary bibliographic data.]
Formally, we recommend adopting and acting on the following principles:
1. Where bibliographic data or collections of bibliographic data are published it is critical that they be published with a clear and explicit statement of the wishes and expectations of the publishers with respect to re-use and re-purposing of individual bibliographic descriptions, the whole data collection, and subsets of the collection. This statement should be precise, irrevocable, and based on an appropriate and recognized legal statement in the form of a waiver or license.
When publishing data make an explicit and robust license statement.
2. Many widely recognized licenses are not intended for, and are not appropriate for, bibliographic data or collections of bibliographic data. A variety of waivers and licenses that are designed for and appropriate for the treatment of data are described at http://www.opendefinition.org/licenses/#Data.
Creative Commons licenses (apart from CC0), GFDL, GPL, BSD, etc. are NOT appropriate for data and their use is STRONGLY discouraged.
Use a recognized waiver or license that is appropriate for data.
3. The use of licenses which limit commercial re-use or limit the production of derivative works by excluding use for particular purposes or by specific persons or organizations is STRONGLY discouraged. These licenses make it impossible to effectively integrate and re-purpose datasets.
They furthermore prevent commercial services which add value to bibliographic data or commercial activities which could be used to support data preservation.
If you want your data to be effectively used and added to by others it should be open as defined by the Open Definition (http://opendefinition.org/) – in particular non-commercial and other restrictive clauses should not be used.
4. Furthermore, it is STRONGLY recommended that bibliographic data or collections of bibliographic data, especially where publicly funded, be explicitly placed in the public domain via the use of the Public Domain Dedication and Licence or Creative Commons Zero Waiver. This ethos of sharing and re-use should fit well within the remit of publicly funded cultural heritage institutions.
We strongly recommend explicitly placing bibliographic data in the Public Domain via PDDL or CC0.
A non-comprehensive list of bibliographic data.
Core data: names and identifiers of author(s) and editor(s), titles, publisher information, publication date and place, identification of parent work (e.g. a journal), page information, URIs.
Secondary data: format of work, non-web identifiers (ISBN, LCCN, OCLC number etc.), an indication of rights associated with a work, information on sponsorship (e.g. funding), information about carrier type, extent and size information, administrative data (last modified etc.), relevant links (to Wikipedia, Google books, Amazon etc.), table of contents, links to digitized parts of a work (tables of content, registers, bibliographies etc.), addresses and other contact details about the author(s), cover images, abstracts, reviews, summaries, subject headings, assigned keywords, classification notation, user-generated tags, exemplar data (number of holdings, call number), …
Contributors: Karen Coyle, Mark MacGillivray, Peter Murray-Rust, Ben O’ Steen, Jim Pitman, Adrian Pohl, Rufus Pollock