Open Bibliography at Berkeley; new visions

I am spending 4 wonderful days working at Berkeley with Jim Pitman on Open Bibliography and BibJSON having met Jim and Karen Coyle IRL for the first time.

Bibliography? Boring…

No. Bibliography is the Map Of Scholarship. It tells us who has created what when and where. Traditionally it’s been seen as a library subject, managing the books and the catalogue, but in the electronic era it’s relevant for anyone and everyone. Books are changing, and research and its publications / communications is changing. So the way we communicate this is vital.

If you are a scholar or researcher your career is measured by bibliography. Your “H-index” is bibliography. The university’s “REF” or other research assessment is bibliography. It’s the formalization and characterization of scholarly output.

Isn’t it just books and papers / articles?

No longer. There’s many people (including me) who feel that there are a whole lot of outputs and inputs that are just as important:

  • Online accesses
  • Collaborations
  • Software
  • Blogs
  • Twitter (What??? – what a lot of rubbish – only citations in peer reviewed papers matter. No, the twitter and blogospheres have excellent records of post-review of papers – finding errors and even forcing retractions and rewrites).
  • Datasets
  • Online resources

For example the Altmetrics group has some very exciting indications of how publications have an impact.

And citations – well they tell you something about what you did five years ago. Useful for historians of science, perhaps. Although look at the h-index of Galois (it’s 2 as he only published two papers before getting shot)

So we need a new approach to bibliography. Perhaps a new name. Something where we can record our outputs in detail. And where we are able to determine what is important – not some distanced commercial company which “measures” our value.

Didn’t you know that? Your value isn’t measured by your peers in the discipline or the university. It’s determined by what large commercial companies can make money out of. They only measure the easy things.

And they build walled gardens – they control us. The bibliography and metrics need to be Open. So we can actually verify what is being calculated. (Didn’t you know? The algorithms for determining the value of a scholar/researcher are commercial secrets – often created through tricky commercial deals).

What are we doing in Berkeley? We are continuing the JISC project on Open Bibliography. We are creating a universal language – BibJSON. Open source servers (BIBserver). Open tools for editing and creating bibliography. Reading lists. Reports, etc.

Aimed at individual scholars, departments, subject groups. To let them tell the world what they have done, what resources they think others would be interested in.

It’s all Open. But we are very happy to work with commercial and other organizations. To look at interoperability. To look at making bibliographic data Open.

I’ll be posting regularly on this.

This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to Open Bibliography at Berkeley; new visions

  1. James says:

    The idea of an “Open Bibliography” does make a lot of sense for the reasons outlined. It probably would be a huge benefit for researchers to be able to more quickly estimate the impact of their work and help them make “course corrections” in doing future research using that knowledge. It would be interesting to know the algorithm that is being considered for doing the ranking. Something like Google’s PageRank to highly rank articles with more links?

    • pm286 says:

      Thanks James,
      The main problem is that although an Open Bibliography is conceivable it’s harder to get Open Citations. The problem is:
      * the citations are often in free text and difficult to parse from the original article. (Remember that a major point of current publishing technology is to destroy information so that the publisher retains an advantage over every one else)
      * The citations are owned by the publisher, because the authors give them. So there is a huge and profitable industry in taking our citation information and selling it back to us.
      As a result the academic community has little citation information to do ranking with. We can use the Open subset of PMC but it’s about the only place to find open material othaer than BMC/PLoS (and they overlap heavily with UKPMC).

  2. Peter Sefton says:

    Good to hear Peter. I’d like to suggest you take a look at aligning BibJSON with the JSON format used by Zotero and (I think) Mendeley – these are very widely used systems and together they provide a lot of bibliographic resources. Both have APIs and with the Zotero one at least you can query it to get JSON data about a resource where the author has sharing turned on.
    Apart from straightforward interop with Zotero, you get access to CSL for citation formatting. The CSL format is possibly a little more complicated than you are after but note that it has evolved out of dealing with millions of real citations that need to be formatted, hence the attention to name and date parts, etc.
    I have raised this with Mark MacGillivray on the jiscHTML5 list, but so far no response.
    See the JSON schema here; https://github.com/citation-style-language/schema/blob/master/csl-data.json
    How close is that to BibJSON?

    • pm286 says:

      Thanks PT,
      Very useful – we were just looking at Zotero today. There is huge synergy as you and I know between BibJSON and ScHTML.

  3. I wonder, is it legally OK to openly publish the references I’ve used in an article that is published in a closed journal?

    • pm286 says:

      IANAL but yes, in that it has been done for two centuries. Anyone can read a “printed” article and abstract facts and bibliographic data and republish – this is central to scholarship. Any public or private abstracting activity does that. What is contested is:
      (a) collections of bibliography – i.e. reproducing someone else’s collection verbatim. This is protected in some jurisdictions (e.g. European database directive). But it’s a grey area.
      (b) contract law. the universities sign contracts with the publsier which forbid readers to use electronic means to read collections of journals. This is not a violation of copyright, it is a condition which purchasing officers/librarians agree to when they rent the journals. The University of Cambridge contract with – say – Elsevier – forbids anyone to spiers, crawl, index any of the material we rent. I could create 5 million indexes automatically from the literature but I am banned from doing so – and the university would probably be cut off if I did so. It’s appalling that this right has been surrended.
      So if it’s done on a few papers, by hand, it’s legal. If it’s done on a lot, by machine, we have signed away our right to worth with C21 philosophy and tools

Leave a Reply

Your email address will not be published. Required fields are marked *