Monthly Archives: November 2010

Aix sponsa and the mystery of the ISBN


Our local supermarket is now accessible by a splendid new bridge over the Cam (they paid for part of it – and every little helps). There's lots of exciting things here – the Museum of Technology (Victorian sewage pumping station – well worth a visit), the boaties, some classic 1960's industrial architecture, our mediaeval church (St Andrews), etc. Here's my mobile phone photo:

But the reason for this post is the duck at (0.5, 0.1) – It's a Wood Duck (aix sponsa) and they are extremely rare in Britain. (Before you get excited I suspect it's an escape – see from about 3 years ago). Here' my best photo – confirming the identification, I hope.

So I turned to a wonderful book published 34 years ago. Here's a photo of my copy. (I am claiming "fair use" – not that that is relevant in the UK - for having photographed the cover as I want to make a relevant point).


It's got records for every 10 km square in the UK for all breeding birds (the one shown is a Stonechat (Saxicola torquata)). You can see how it's clustered round the West of the country. So, is this book in the BL's bibliography? Its ISBN is 0 903793 01 6 as printed in the front matter. It's © British Trust for Ornithology and was First published in 1976. Can I confirm the ISBN?

One Wikpedian agrees ( . I can also find it on this site ( ) which gives this ISBN, but also has several others for the same book:

(ISBN: 0856610186 / 0-85661-018-6) [Date 1976] and (ISBN: 9780856610189) [Date 1980] and

    The Atlas of Breeding Birds in Britain and Ireland (ISBN: 9780856610189)

J.T.R. Sharrock

Book Description: British Trust for Ornithology, Tring, 1976.

If I search for it on Bibliographica I get 6 entries all pointing to 0856610186:

The atlas of breeding birds in Britain and Ireland

by Sharrock, J. T. R. (John Timothy Robin)
Includes bibliographies and index.

I can understand that a 1980 reprint could have a different ISBN. (And I am not confusing this with the later study:

by a different author).

So my book has a different ISBN from the one in the BL's entries in JISC Openbib. And there are at least 4 identifiers for the "same book".

Any enlightenment welcomed.


My physical book has a different ISBN from the British National Bibliography. My guess is that the BNB has a pre-publication one. I've also learnt that there are ISBN-10 (10 digits) and ISBN-13 (13 digits). The 10-character string seems similar but not always identical between the two. And, As I am told, Welcome to the world of ISBNs.

Maybe the world can help…



Beyond the PDF: when should we add semantics

On BTPDF here has a lively debate on adding semantics to scientific publications and this is a snapshot of some of my own contributions. The site (and the meeting) are – I think – open to anyone. The idea is that people will offer ideas and materials in to support the meeting.

** One discipline I think we should adopt before BTPDF is that we should all read a variety of papers in different fields.  I suspect the majority of attendees will be bioscientists because they have an excellent record of knowling they need semantics and developing it. I don't think we can become polymaths and if we try to solve all disciplines simultaneously we will get nowhere. The problems of a clinical trial are completely different from string theory.

My zeroth law of semantic enhancement (not well phrased but it was late at night). Readers of this blog will know how easy it is to corrupt information

All discipline-independent syntactic problems are soluble and must be solved

By this I mean that characters must be expressed as characters with encoding (and not pixel images). All images with text should have the text machine processable. All graphs should be accomained by the raw data that created them (e.g. CSV). All numeric quantities should have units. All maths should use MathML. All chemistry should be in CML. Geolocations should use KML; maps should use polygons.  All line graphics should contain scalable vectors. None of this is rocket science - it's purely a question of will. The temperature is not 278, it is 278 K.

If we do not solve the zeroth law there is little point in aiming for anything higher. Because the failure to obey the law corrupts the information irretrievably.

And some types of enhancement.


There are three main places that semantics can be created:

(a) by machines at the time of machine authoring. Where this can be done this is undoudtedly the highest quality as the machine defines the semantics and the consistency. An incresing amount of authoring is now done by simulations or instruments and there is absolutely no reason why the semantics should not be preserved. It is simply laziness to discard machine-produced annotations such as units, errors, etc.

(b) by humans.  The earlier that this is done the better. the person best placed to annotate information is the person who created it, although this must be modulated with experience. Semantic information added later may involve guessing (units, errors, conditions, etc.). Annotation at time of conventional publication is almost certain to involve uncertainty. One consequence of this is that we should develop semantic notebooks before we put effort into late authoring tools. Even more of a problem is annotation introduced by technical editors in publishing houses who were not involved in the science is likely to involve errors and misconceptions.

(c) by text/data/imagemining. Of course this is never perfect. Its advantage is that it can be done at any stage (unless prevented by lawyers). The disadvantages are many. A simplistic lexical approach will get many false positives. A human reader aware of this probably won't worry but people who don't are likely to dismiss any document with even one false positive (false negatives are not a problem). Hopefully more publishers will allow text/data/imagemining in which case we can build up a corpus of usage. For example 100 examples of "
off the south coast of Iceland" may be correlated with lat long , maps in the papers, etc.) and so may give us a textual density function. In this way we increase the semantic precision of the complete discipline. We can even hope to extract equations and other semantic objects.

OPSIN on Safari

Very pleased to see an announcement from Chris Swain who has ported OPSIN, out chemical name2structure processor to Safari. See

Opsin Extension

The Unilever Centre for Molecular Science Informatics have been at the forefront at developing tools for the creation and curation of molecular data.
(Open Source Chemistry Analysis Routines) is software for the semantic annotation of chemistry papers a key part of this is
a name to chemical structure converter. There is now a web interface to the program available and the web services can be accessed by anyone. Whilst this is very useful in its own right the beauty of such services is that other can build tools that access them.
There are a number

Safari Extensions described
on this site that access similar services and with the help of
I'm happy to anounce a new addition.
The Safari Extension for Opsin (download) allows the user to highlight a chemical name in a web page and then control click affords a dropdown menu, click on "Display ... using Opsin" and a small window will open displaying the chemical structure. What is particularly nice is that in addition to providing the structure in png format the same web service also provides the chemical structure in SMILES, InChi and CML format. If you click one of the buttons and the bottom of the structure window the structure will be downloaded in the appropriate format.

The great thing about this is that there is no legal barrier to building Open Source applications on top of other OS ones. OPSIN is actually a great platform for volunteers. Daniel has done very well in building all the major internals and the (relatively small amount of) remaining gaps are mainly filling in vocabulary. So we'd love to have offers of help.

What Chris has shown is how well-designed apps can be put together. There's other ways of getting info into OPSIN besides typing – see if you can guess what they are and whether you have the skills to implement them. (We've built a prototype of one, but the others still await hacking).

Why the British Bibliography is Wow!


I've posted recently on my Wow! Moment on seeing the British National Bibliography realeased in RDF by JISC and OKF. See which already has about 15 posts in less than a day. But isn't the BNB (BTW do not confuse this with DNB just a collection of what books the BL contains? Why is that so important? Can't I just ask them whether they have got a first edition of Lady Chatterley's Lover?

This is the first time that I have "searched for a book" other than to buy a specific one or read a specific one. And what you get back from the BNB is not "a book". You get a local graph in scholarship space. Times, places, events, comments, comments on comments… I had NO idea that so many books have been written about LCL. I do remember reading one of Lawrence's works at school where he discussed LCL and he also had paintings of nudes which broke the then convention by having obvious public hair. I don't think the paintings are hve4ry highly regarded now, but who knows? Anyway the books in the BL have doubtless got Lawrence's illustrations. And the BNB will tell us which books actually have illustrations. But that's just for starters.

Here's some of the traffic:


But is it reliable?  I looked up the book I wrote (Presenting XML, SAMS

Net, 1997) and find that it claims it was written by Laura Alschuler.

How did that happen?




Richard Light


And a few hours later:




I've investigated this issue. As William points out, this has to do with the source data, not the RDF/XML representation or the conversion.


The record you found was created prior to publication for the British National Bibliography (BNB), on the assumption that it would be published in the UK and would be part of the Library's legal deposit intake. It may worth pointing out that records created prior to publication, on the basis of information provided by the publisher, sometimes contain inaccurate data.


The item was however published in the United States and was ineligible for the BNB. The BL subsequently purchased this item; this has been catalogued and correctly attributed. When I searched our "Search our catalogue" under the title "Presenting XML", your book was the first hit.



We will take steps to delete the incorrect record from our catalogue but it will remain in our rdf/xml dataset until we update this data. William may want to delete it from his version of our data set.


Hope this helps.





Corine Deliot

Metadata Standards Analyst

The British Library

Boston Spa, Wetherby

West Yorkshire LS23 7BQ

e-mail: corine.deliot at



So there's an error in the BNB! Well we all know there are errors. All human collections of metadata contain errors.  We've found one – and the curators has responded immediately. So we've got an immediate opportunity and questions

  • How do we report this?
  • Is there a role for crowdsourcing? Would YOU like to help improve  the country's Bibliography?
  • Should the record be deleted? My own view – see list is that one should not delete records but obsolete or annotate them. Deleted records are invisible (but follow the list!)


So Richard replies:

It may worth pointing out that records created prior to

>publication, on the basis of information provided by the publisher,

>sometimes contain inaccurate data.


Ah: that was probably the person they _wanted_ to write the book ;-)


Fascinating. Publishers do actually publish catalogue entries for books that have not yet been written. Or which are at a very early stage.


And Will Waites again:


The cleaned record, which I would agree should not be deleted but

superceded, can be retrieved as


So what do we do about this? If it won't appear in further corrected

data from the BL, we should mint a new URI for it. This might be

directly in The identifier/slug shouldn't be used

because that's the BNB identifier. Easiest thing is just to make a



So if you do a search now you'll see two records for that book, the

incorrect one from the original data and a hand-made one based on that

record and what I could easily find with google.


So the new record is at:


(anyone with a suggestion about how to make better identifiers please

pipe up).


And now Kingsley, who runs Openlink / Virtuoso – they provide the backing Triplestore free to the project. In  fact I think they provide free triple storage to any open project. So do others like Talis.





Next step is Data Wiki dimension, courtesy of WebID. This enables group

maintenance of data i.e., someone spots an error, makes a change e.g. in

their own named graph, post data space management approval, changes go

to the main graph, if need be.


Anyway, great stuff! The foundation context for Data Wikis is now taking

shape :-) 






Kingsley Idehen    

President&  CEO

OpenLink Software



Twitter/ kidehen


So this is much more than a catalogue of what's in the BL. It's a living piece of this century's scholarship. It's global. It's Open. It's dynamic.

Here's Will's new entry at Bibliographica ( ). [Don't be frightened by the UUIDs – you'll come to love them just like URLs): I can't paste the format into this , it looks like:


And if I follow the hyperlink at the bottom I find out all about Richard. (I've met Richard – one of the seminal influences in XML. So I'll show this page – and who knows – you might want to read his other books!

Richard Light

Author of Presenting Xml (Presenting)











Works by Richard Light

combine/separate works

Top members (works)

BrianKelly (1), webfocus (1), adman (1), jtauber (1), terryzman (1), zencat (1), EH_curators (1), goobysmootcher (1), travelinlibrarian (1), semantico (1), bookdrop (1), bstiekes (1), tdobias (1) — more

Member favorites

Members: None

So more Wow! I now know that Richard is interested in Museums! So (now) am I. Brian Kelly has a copy of Richard's book. (I might even have one if I rummage). So the world expands at every node. It's not a book collection it's a Hyperbibliography.

POSTSCRIPT: I had hoped to be meeting more people in the Library community at and presenting this development and getting ideas. But the program has been rejigged and I am no longer going. I may be able to create my own library event in January – more later about that.


Don’t mention the War Memorial

#jisc 15/10

You'll have picked up from my posts that I've been going to the Imperial War Museum and talking about War Memorials. Whatever for? Well JISC ran a day for people to find out about their latest capital call for proposals
(Grant 15/10: JISC infrastructure for education and research programme) and they invited anyone down to hear what the scope and goals are and anyway. This call had lots of strands; I am impressed by how many different things have to be covered (Identity management, Preservation Tools, "at risk repositories", etc.). It's a great place to meet the programme managers. (BTW for those people who are at the start of their career, funding body program managers are your friends. It's easy to think they want to tear holes in your grant proposal – quite the opposite – they want great proposals that are going to make a real change in the world. And this session was there to do this – you could sit down and discuss your ideas and get constructive guidance.


It's also a place for making contacts. So I like to talk to people I don't know and so starting chatting to the person next to me – Frances. She was from the Imperial War Museum. I didn't know very much about it (other than taking the kids there) and the huge naval guns:

Anyway Frances had come to see if there was anyone who could help with their management of museum metadata (one of the strands). She and her colleague Jane run the UK National Inventory of War Memorials (UKNIWM). There are probably 100,000 war memorials in the UK of which they have records for about 60,000:

War memorials are a familiar sight in the landscape of the United Kingdom.  They provide insight into not only the changing face of commemoration but also military history, social history and art history.

The UK National Inventory of War Memorials is based at the Imperial War Museum and is working to compile a record of all war memorials in the UK and to promote their appreciation, use and preservation.

The UKNIWM website allows people everywhere to discover information about the individuals, places and events recorded on our war memorials. Family historians or school classes can now investigate their local memorials, adding details, perhaps even photographs, to bring them alive. 

Our aim is that the story of each memorial, its building, unveiling and significance to community life will be here.

What's impressive about UKNIWM is their volunteer community:

The UK National Inventory of War Memorials is reliant upon volunteers in its work to collect information about war memorials located in the United Kingdom. Since its foundation in 1989 thousands of people have helped to record information about war memorials in the UK.

We have recorded over 60,000 of the estimated 100,000 memorials so far, but there is still much information to be collected.

Not only do we need more help with fieldwork but we are also looking for researchers to go through their local archives and find information on the background history of war memorials.

Remote data inputting: We are currently in the process of retrieving the First World War names database from Channel 4 following our successful partnership with the Lost Generation series. The database will be accessible on the UK National Inventory of War Memorials website once we have transferred and sorted the data. If you are interested in assisting with editing the current names data and inputting further names do contact us. You can work on the project remotely from home and training is provided to help you carry out the task. You will be given a set of memorial names lists to work with and, once edited, you can return these and come back for more if you wish!

Currently most of the data input is based on paper and Frances had come to see if the JISC community had ways of using the Internet and electronic means that could be helpful. I explained the success of things like Open Street Map, (which actually has war memorials on kit!). The upshot is that we've talked with the Open Knowledge Foundation and have submitted a proposal for this call to create a "Semantic Toolkit", (STK) to support the technical infrastructure. Whether or not it is funded it's made a valuable link between the OKF and the IWM/UKNIWM which will certainly help broaden all our horizons.

The British National Bibliography – wow! Try it out

#jiscopenbib #okfn

Something truly wonderful arrived in my email a few minutes ago: It's an announcement by Will Waites – and every word and character is exciting:

Following up on the earlier announcement [1] that the British Library[2] has made the British National Bibliography [3] available under a public domain dedication, the JISC Open Bibliography [4] project has worked to make this data more useable.

The data has been loaded into a Virtuoso store that is queriable through the SPARQL Endpoint [5] and the URIs that we have assigned each record use the ORDF [6] software to make them dereferencable, supporting perform content auto-negotiation as well as embedding RDFa in the HTML representation.

The data contains some 3 million individual records and some 173 million triples. Indexing the data was a very CPU intensive process taking approximately three days. Transforming and loading the source data took about five hours.

For more detail see


This is a real milestone. It shows that:

  • Technically it's possible to take legacy records from library collections (which are primarily used for managing the physical book collection, making them available to readers, loans, etc.) and turn them into modern semantic objects. These can be queried (see below)
  • Organizationally (and this comes from the top of the organizations) : JISC (programme manager (Dave Flanders) has had the vision to support this, The BL (Ben White and Neil Wilson among many others) again from the top to our project has been very proactive in making the material available as PD, The OKF have a network of volunteers and contractors which is second to none. And these include Rufus Pollock, Ben O'Steen, Will Waites and Mark McGillivray. They are all ace hackers (Python, RDF, etc.) Software design comes out of their fingers.

And very importantly we now know what the scale is. Let's say we have 20 million possible books (a very rough guess as books in major libraries are counted with a tape measure). A gigatriple. This shouldn't frighten us today. The main thing is that the web is scaling up for RDF and there are many potential suppliers and providers.

So try it

gives a SPARQL form. DON'T let this frighten you. You get something like:

PREFIX dc: <>

PREFIX bibo: <>

PREFIX foaf: <>

SELECT DISTINCT ?book ?title ?name ?description


?book a bibo:Book .

?book dc:title ?title . ?title bif:contains "Edinburgh" .

OPTIONAL { ?book dc:description ?description } .


?book dc:contributor ?author . ?author foaf:name ?name


} GROUP BY ?book LIMIT 50


Just click "Run Query" and in less than a second it has returned a whole lot of books about Edinburgh. To show that it works for anything, change "Edinburgh" to "Chatterley" and you get the table below. I had no idea there were so many books about LCL. But the BL does! This gives so many opportunities for asking Linked Open Data questions. And this is just the start…





Lady Chatterley's lover!

G. R.

Ill on inside covers.

The trial of Lady Chatterley : Regina v. Penguin Books Limited

Penguin Books.

Lady Chatterley's lover : a propos of "Lady Chatterley's Lover"

Squires, Michael

Includes bibliographical references.

The second Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

Includes bibliographical references (p. 375).

Lady Chatterley's lover ; A propos of 'Lady Chatterley's lover D.H. Lawrence

Squires, Michael

Includes bibliographical references.

Lady Chatterley's lover : a propos of 'Lady Chatterley's lover'

Squires, Michael

Includes index.

Lady Chatterley's lover ; A propos of 'Lady Chatterley's lover D.H. Lawrence

Squires, Michael

This ed. first published: 1994.

The trial of Lady Chatterley : Regina v. Penguin Books Limited

Rolph, C. H. (Cecil Hewitt)

Lady Chatterley's lover : a propos of "Lady Chatterley's Lover"

Lawrence, D. H. (David Herbert)

Includes bibliographical references.

Lady Chatterley's confession.

Feinstein, Elaine.


Lady Chatterley's lover!

C. A.

Ill on inside covers.

When Marys were kings! : Lady Chatterley, the Beatles, Arnold St. Marys FC and much, much more--

Hawthorn, James.

Includes bibliographical references.

Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

Lady Chatterley's trial

Rolph, C. H. (Cecil Hewitt)

Extract from: The trial of Lady Chatterley. London : Penguin, 1961.

Lady Chatterley's lover : prefaced by the author's Apropos of Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

D. H. Lawrence's John Thomas and Lady Jane according to Spike Milligan. Part 2, Lady Chatterley's lover.

Milligan, Spike

Originally published: London: Michael Joseph, 1995.

Lady Chatterley's lover : a propos of 'Lady Chatterley's lover'

Lawrence, D. H. (David Herbert)

This ed. originally published: 1993.

John Thomas and Lady Jane : the second version of Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

Originally published: London : Heinemann, 1972.

John Thomas and Lady Jane : the second version of 'Lady Chatterley's lover'

Lawrence, D. H. (David Herbert)

This version originally published in an Italian translation in 'Le Tre "Lady Chatterley"'. Milano : Mondadori, 1954. - English text originally published: London : Heinemann, 1972.

The Lady Chatterley's Lover trial : (Regina v. Penguin Books Limited)

Great Britain. Central Criminal Court.

Sons and lovers ; Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

The abolition of Britain : the British cultural revolution from Lady Chatterley to Tony Blair

Hitchens, Peter

Includes bibliographical references.

The trial of Lady Chatterley : Regina v.Penguin Books,Limited

Penguin Books Ltd.

Lady Chatterley's lover : a propos of "Lady Chatterley's lover"

Lawrence, D. H. (David Herbert)

Formerly CIP.

Lady Chatterley's lover : a propos of "Lady Chatterley's lover"

Lawrence, D. H. (David Herbert)

Includes bibliographical references.

Sons and lovers ; Women in love ; Love among the haystacks ; Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

Sons and lovers ; Women in love ; Love among the haystacks ; Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

Originally published: London : Heinemann, 1983.

Lady Chatterley's lover

C. A.

Cover title: The comic book Lady Chatterley's lover.

Lady Chatterley's lover.

Lawrence, D. H. (David Herbert)

Lady Chatterley's lover : a propos of "Lady Chatterley's Lover"

Lawrence, D. H. (David Herbert)

2nd work originally published: London : Mandrake Press, 1930.

Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

This ed. originally published: 1961.

Lady Chatterley's lover : a propos of "Lady Chatterley's Lover"

Lawrence, D. H. (David Herbert)

This edition published with a new chronology, introduction, further reading, and a note on the text.

The comic book Lady Chatterley's lover

Emerson, Hunt.

Previous ed.: published as Lady Chatterley's lover. 1986.

Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

Originally published, Florence: Privately printed, 1928.

Lady Chatterley's daughter.

Robins, Patricia

Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

For subscribers to the series.

Lady Chatterley : the making of the novel

Britton, Derek.

Includes index.

Lady Chatterley's lover.

Lawrence, D. H. (David Herbert)

Lady Chatterley's lover : a propos of "Lady Chatterley's Lover"

Squires, Michael

This edition published with a new chronology, introduction, further reading, and a note on the text.

Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

Formerly CIP.

The second Lady Chatterley's lover

Lawrence, D. H. (David Herbert)

Film tie-in.

The abolition of Britain : the British cultural revolution from Lady Chatterley to Tony Blair

Hitchens, Peter

Previous ed.: (i.e. 1st ed.) 1999.

The end of obscenity : the trials of 'Lady Chatterley', 'Tropic of Cancer' and 'Fanny Hill'

Rembar, Charles.

Originally published, New York: Random House, 1968.

Lady Chatterley's lover : a propos of "Lady Chatterley's Lover"

Squires, Michael

Formerly CIP.

Lady Chatterley's lover : a propos of 'Lady Chatterley's lover'

Lawrence, D. H. (David Herbert)

This ed. originally published: 1994.

Lady Chatterley's lover ; A propos of "Lady Chatterley's lover"

Squires, Michael

Originally published: Cambridge University Press, 1993.

The trial of Lady Chatterley : Regina v. Penguin Books Limited

Hogarth, Paul.

Lady Chatterley's lover : a propos of "Lady Chatterley's Lover"

Squires, Michael

2nd work originally published: London : Mandrake Press, 1930.

John Thomas and Lady Jane : the second version of 'Lady Chatterley's lover'

Lawrence, D. H. (David Herbert)

This version originally published in an Italian translation in 'Le tre "Lady Chatterley"'. Milano: Mondadori, 1954.

LUCERO : The Open University + JISC Open Data; mouthwatering

A great announcement from the Open University.

In short, The OU (with the active and welcome involvement of JISC) is exposing its data (which could be anything, but think staff details, courses, research interests, I think) as Linked Open Data. I'll post some snippets and then say why I think this is critical.

The JISC-funded OU's LUCERO (Linking University Content for Education
and Research Online) project has enabled information stored across many
of the university's websites to be brought together in a common, openly
accessible location:

"…members of the public, students,
researchers and organisations will be able to easily search, extract
and, more importantly, reuse The Open University's information and data."

"The data is there, and already visible,
but in many different places, systems and databases. By exposing it as
linked data on, we make it accessible and exploitable,
and open to uses that we don't have to dictate."

LUCERO is The Open University making the initial step on
behalf of UK universities to contribute to what was the original
intention behind a World Wide Web."

The Open University joins organisations such as the UK, US
and Australian governments, and international media outlets, such as the
BBC and the New York Times.

David Flanders, Programme Manager, Information Environment at JISC,
said: "This new centralised-data-watering-pump is the first launched of
its kind in UK universities and should be celebrated accordingly. (...)
hopefully this is the first of many to come."

For more information visit:


I celebrate this. It is indeed mouthwatering. If every institution exposed simple data and metadata then the rest can be done by machines. The main thing is to get it out there. Let's assume that staff details and research interests are available. Here for example is part of the OU's chemistry research…

Inorganic, materials and coordination chemistry

Research focuses on the synthesis and characterisation (by diffraction, spectroscopic, thermal and computer modelling methods) of solid inorganic materials including magnetic oxides, zeolites, titania pigments, bimetallic catalysts, solid acid catalysts and micro- and meso-porous materials. Research in coordination chemistry includes work on the macrobicyclic hexaimino ligands (azocryptands), the study of magnetic exchange interactions and the characterisation of radical anions in charge transfer processes.

Bingle will not index chemistry (it doesn't know that an azacryptand is similar to a crown ether). But we can extract this material and turn it into semantic chemistry. That then allows us to ask questions like:

"what UK universities contain PhD theses on zeolites?" (A zeolite is a natural or synthetic aluminosilicate that is very widely used in catalysis and other applications). That will be almost trivial to answer under this system. It will be better than Bingle.

But of course it won't be much use for those Universities which:

  • Do not publish their theses
  • Only publish them in PDF (yes we can hack it, but)
  • Hide all their chemistry theses behind a wall of secrecy.

That means that when students from the enlightened institutions publish the machines will make it much easier for academic or commercial recruiters to discover them and offer them positions. Word of mouth will be replaced by word of web. Managed in large part by machines.

English libels laws are used to suppress scientific debate; please get them changed

Stephen Curry ( ) has alerted us to the need to reform the English (sic, it's better in Scotland) libel laws. Briefly, any organization or persons, can practice bad or questionable science and defend themselves from criticism by suing the challenger. This is far worse than, say, in the US though there other asymmetric laws benefitting the rich and powerful there. From SC:

The British cardiologist Dr Peter Wilmshurst was reported in 2007 to have made remarks critical of a clinical trial involving a medical device made by NMT Medical. He is now being sued for libel.

The case is complex and I have not mastered the detail. I have, however, grasped one essential fact. Dr Wilmshurst's comments were made in the United States. His words were reported on a US medical website, Heartwire ( NMT Medical is an American company. But Wilmshurst is not being sued in America, where the right to freedom of expression is robustly enshrined in the constitution. Instead he is being taken to court in England, where the libel laws allow foreign individuals and corporations much freer rein.


The English libel law is particularly dangerous for bloggers, who are generally not backed by publishers, and who can end up being sued in London regardless of where the blog was posted. The internet allows bloggers to reach a global audience, but it also allows the High Court in London to have a global reach.


The good news is that the British Government has made a commitment to draft a bill that will reform libel, but it is essential that bloggers and their readers send a strong signal to politicians so that they follow through on this promise. You can do this by joining me and over 50,000 others who have already signed the libel reform petition at

I've signed. So have 50,000 others. It's not just scientists, though science requires scientists to speak out against what they feel to be bad. The law is a blunt weapon but a powerful one and needs changing in this respect.

Please sign.

Do clipboards support XML?

Henry Rzepa has an important blog post on whether we can expect to copy data (sic) from one environment to another:


For those of us who were around in 1985, an important chemical IT innovation occurred. We could acquire a computer which could be used to draw chemical structures in one application, and via a mysterious and mostly invisible entity called the clipboard, paste it into a word processor (it was called a Macintosh). Perchance even print the result on a laserprinter. Most students of the present age have no idea what we used to do before this innovation! Perhaps not in 1985, but at some stage shortly thereafter, and in effect without most people noticing, the return journey also started working, the so-called round trip. It seemed natural that a chemical structure diagram subjected to this treatment could still be chemically edited, and that it could make the round trip repeatedly. Little did we realise how fragile this round trip might be. Years later, the computer and its clipboard, the chemistry software, and the word processor had all moved on many generations (it is important to flag that three different vendors were involved, all using proprietary formats to weave their magic). And (on a Mac at least) the round-tripping no longer worked. Upon its return to (Chemdraw in this instance), it had been rendered inert, un-editable, and devoid of semantic meaning unless a human intervened. [PMR's emph] By the way, this process of data-loss is easily demonstrated even on this blog. The chemical diagrams you see here are similarly devoid of data, being merely bit-mapped JPG images. Which is why, on many of these posts, I put in the caption Click for 3D, which gives you access to the chemical data proper (in CML or other formats). And I throw in a digital repository identifier for good measure should you want a full dataset.


Times moved on and the limitations of PICT [a graphical format] set in. Apple refocussed on the PDF format. Related, notice, to the Postscript format that Adobe had introduced in order to allow high quality laserprinting. PICT support was abandoned, and the various components no longer carried recognisable data (specifically the clipboard or the ability of Word to recognise the data). Round-tripping broke. Does this matter? Well, one colleague where I work had accumulated more than 1000 chemical diagrams, which he decided to store in Powerpoint (and yes, he threw the original Chemdraw files away). The day came when he wanted to round trip one of them. And of course he could not. He was rather upset I have to say!

Oh dear!

Peter Murray-Rust and his team have produced CML4Word (or as Microsoft call it, Chemistry add-in for Word). At its heart is data integrity. Fantastic! But I wonder if it survives on Microsoft's clipboard ( I know it does not on Apple's, since CML4Word is not available on that OS. And is unlikely to ever become so).

There are no legal restrictions from our side to porting it. It's a non-trivial amount of work – I'[d estimate 6 months to a year. But if we want it enough, then it's possible. Its likelihood depends on people wanting semaqntic chemistry on all platforms

#quixotechem (Open computational chemistry infrastructure) UPDATE

Our bottom-up infrastructure for Computational Chemistry is going very well. We have enthusiastic weekly meetings which show significant progress each week. I'm going to be demo'ing the concept next month in India (more later).

  • The parser strategy seems to be viable. I have parsed a large NWChem file and am now turning to GAMESS-US. I'll be using this as a tutorial in parsing infrastructure and strategy if people want to join in . There is a two phase strategy:
    • Parse raw text to XML (using CML vocabulary)
    • Semantify the XML to semantic conformant CML
  • Generating dictionaries. Compchem is the easiest of all subjects to create non-controversial ontologies in and by parsing (say) 10 major codes we build up an excellent picture of the syntactic discourse. We expect each code to have between 100-500 entries in the dictionary. This, just by itself, is a useful tool for users of the code. But it's more because it allows those terms and concepts to be mapped onto other codes. That looks very feasible, much more than most disciplines

The next phase is hard grind in writing these parsers and creating dictionaries. But as we do it we get experience and we also generate better tools.

Sam Adams brought me a present from New Mexico. It's the only private distillery ( . It's palatable. But I'm afraid that my family has me completely seduced on the Scotch Malt Whisky Society (