Do you love books? Get involved! Bibliography wants to be Open


Books are part of the lifeblood of our culture. Their content, their physical form, their impact continues to entrance us. (Yesterday an Audubon was sold for several million). You don’t need to be a librarian or an academic to love books. I am sure that many of you have carefully sorted your books by size, domain, condition, etc. and I’d guess that some of you actually have an index. That’s not just an index, it’s a BIBLIOGRAPHY!

We now have a wonderful resource in the British National Bibliography. This is an index of most of the most important books. Over 3 million. If you love books here’s your chance to get involved. From where Mark McGillivray present the opportunity [1]:

Now that we have a queryable British National Bibliography dataset, we are investigating useful functionality to take advantage of the data.

The team have listed a few development ideas based both on our own interests and on discussion with others in the community:

  1. flagging – attaching notes to bibliographic records highlighting possible updates
  2. ikipedia – link to ikipedia by author / title / ISBN for further information
  3. book crossing – search an ISBN, find where a copy of it is available
  4. public libraries – search by ISBN and find out which local public library it is in
  5. exporting records – for example to bibtex
  6. google scholar lookup

We are moving forward with these, however we know that it is not possible for us to guess all the uses that the community might find for such data, so we would appreciate further comments and new ideas. It would be great to have a list of use cases that are valued by the community, and to enable as many of them as possible by project end.

If you are interested in the Semantic web, linked open data , etc. and are looking for a project , then Open Bibliography is a great place to start. It’s heavily supported by identifier systems – and this is both a good thing and a bad thing. It’s got a lot of excellent bibliographic data and it’s got some not-quite-so-good data. Bibliographic data is entered by humans and humans show variability!

We are starting to get other Open bibliographies. If you are involved with a library, make your data available.

The point of Open Bibliography is NOT to create one-big-clean-universal-bibliography. It’s to build a system that can relate different bibliographies to each other. Here’s a great post by John Wilkin from Michigan ( )

The problem with both the arguments OCLC makes and many of the arguments for openness seem to be predicated on the view that bibliographic data are largely inert, lifeless “records” and that these records are the units that should be distributed and consumed.1

Nothing could be further from the truth. Good bibliographic data are in a state of fairly constant, even if minor, flux. There are periodic refinements to names and terms (through authority work), corrections to or amplifications of discrete elements (e.g., dates, titles, authors), and constant augmentation of the records through connection with ancillary data (e.g., statements about the copyright status of the specific manifestation of the work).

In fact, bibliographic data are the classic example of data that need to live in the linked data space, where not only constant fixes but constant annotation and augmentation can take place. That fact and the fact that most of the bibliographic data we have has been created through a kind of collaborative paradigm (e.g., in OCLC’s WorldCat) makes the OCLC position all the more offensive.

Locking bibliographic data up, particularly through arguments around community norms, means that they won’t be as used or as useful as they might be, and that we will rarely receive the benefits of community in creating and maintaining them. The way these data are often used when shared, however, makes the hue and cry of the other side, which essentially says “give me a copy of your data,” all the more nonsensical: by disseminating these records all over the networked world, we undermine our collective opportunities.


By walling off the data, we, the members of the OCLC cooperative, lose any possibility of community input around a whole host of problems bigger than the collectivity of libraries: Author death dates? Copyright determination? Unknown authors or places of publication?

These problems can best be solved by linked data and crowd-sourcing. And all of this should happen with a free and generous flow of data. OCLC should define its preeminence not by how big or how strong the walls are, but by how good and how well-integrated the data are. If WorldCat were in the flow of work, with others building services and activities around it, no one would care whether copies of the records existed elsewhere, and most of the legitimate requests for copies of the records would morph into linked data projects.

The role of our library community around the data should not be that we are the only ones privileged to touch the data, but that we play some coordinating management role with a world of very interested users contributing effort to the enterprise.

On the other hand, every time someone says this is a problem that should be solved by having records all over the Internet like so many flower seeds on the wind, I see a “solution” that produces exactly what the metaphor implies, a thousand flowers blooming, each metaphorical flower an instance of the same bibliographic record.

What is being argued is that having bibliographic records move around in this way is the sine qua non and even the purpose of openness. When we do that, instead of the collective action we need, we get dispersed and diluted action. Where we need authority, we get babel.


I wanted to use this blog forum as an opportunity to make this point, and also, seemingly incongruously, to announce the availability of nearly 700,000 records from the University of Michigan catalog with a CC-0 license, records that can also be found in OCLC. They are now available here: (CKAN package for the Michigan records).


That said, I believe having the records out there will stimulate even more discussion about the value of openness and the role of OCLC. I’ll have my staff update the file periodically, and in the next release will add the CC-0 mark to the records themselves. I hope the records prove useful to all sorts of initiatives, but I also hope that their availability and my argument helps spur more collective action around solving these problems through linking and associated strategies of openness, and not through file sharing.

There’s a lot more in John’s post and particularly about the role of OCLC ( – the “O” used to be Ohio). The problem is common to many fields (and chemistry is a good example). An organization was set up in the 20th C to abstract and manage the world’s data and metadata. The org did a good job, but it needed lots of human input and set up a business model which requiring charging for products and services. Because this was a large task, only one such organization is usually created (actually chemistry has two). It then effectively creates a monopoly. And, by 1993 (WWW0) it starts to become inefficient and out-of-date. The language in John’s post is exactly the same as for any other abstracting services – books, law, medicine, chemistry, citations, etc… The organization must change, or face increasing bottom-up challenge.

Because in the Internet era we have web democracy. Yesterday in the UK Clay Shirky and (???) were debating on Newsnight ( ) the arrest of Julian Assange of Wikileaks. (???) argued that information wanted to be free and Shirky pointed out that Assange was being curtailed by non-democratic and non-legal methods.

So bibliography wants to be free. If OCLC resists that it will perish in the bottom-up web revolution.

Unless of course the web itself is destroyed. And we all have to be vigilant.

[1] Problem – what has happened to Mark’s “W”s? I have cut and pasted them and they’ve got transformed into invisible characters…


This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Do you love books? Get involved! Bibliography wants to be Open

  1. Pingback: Twitter Trackbacks for Unilever Centre for Molecular Informatics, Cambridge - Do you love books? Get involved! Bibliography wants to be Open « petermr’s blog [] on

  2. Peter Morgan says:

    (???) = Nassim Nicholas Taleb ( and the Newsnight interview is now viewable at
    Peter Morgan

Leave a Reply

Your email address will not be published. Required fields are marked *