Update: I am off to CSIRO(AU), eResearch2012, Open Content Mining, AMI2, PDF hacking etc.

I haven’t blogged for some time because I have been busy elsewhere – going to #okfest, #odlc (Open Data La conference in Paris) and preparing for a significant stay (~ 3months) with CSIRO in Clayton (Melbourne, AU).

I’m in AU at the invitation of Nico Adams and CSIRO as a visiting researcher. When we were daily colleagues Nico pioneered the use of the semantic web for chemistry and materials. He is ahead of the game, but chemistry is slowly waking up to the need for semantics. We’ll be working on themes such as:

  • Formal semantics and ontologies for materials science
  • Open Content Mining for chemical/materials data (AMI2)

As part of this I intend to create materials for learning and using CML (Chemical Markup Language), in weekly chunks. If anyone is interested I’m offering to run a weeklyish series of low key workshops on Semantic Chemistry and more generally Semantic Physical Science (Nico and I ran a day on SPS last year at eResearch Australia 2011). Maybe there will be enough material for a book, and if you know me it won’t be a conventional book. It could be a truly open-authored book if there is interest. Almost certainly Open Content. I’ve registered for eResearch 2012 in Sydney 28-1 Oct so if anyone is going we shall meet. Not doing any workshops this time round.

I’m working hard on Open Content Mining. I’ve developed a generic tool for extracting semantic information from PDFs (yes) called AMI2. It results from many months fairly solid hacking and several previous years of explorations. In the initial cases I have been able to get 100% accuracy from some subsets of PDFs and I’ll be taking you through this in blogs. Ros and I are applying it to phylogenetics and we expect to be able to extract a lot of trees from the literature.

We’ll be confining ourselves to BMC and PLoS material (with BMC being technical easier). I’ve downloaded 1000 potential papers and Ross Mounce will be annotating 80 of them as to whether they contain phylogenetics, where it is, etc. Content mining requires hard, boring graft to create a trustable system but the effort is worth it.

We can’t use it on Molecular Phylogenetics and Evolution although it has a lot of trees. Why Not? [Regular readers will know the answer].

And some recent experiences with Open. #okfest was incredible – a real feel that the world was changing and we and others were changing it. It’s the real sense of “Open”. Open isn’t just a licence or a process – it’s a community and a state of mind. It’s joyful, risk taking, collaborative, global.

And Open Scholarship? Well mostly it doesn’t exist and I’m seeing difficulty where it’s coming from. Open Scholarship consists of at least Open Access, Authoring, Bibliography, Citations, Data, Science, etc. Of these only Open Science has a true Open agenda, community and practice (inspired by Joseph Jackson, Mat Todd and others who want to change the world). Open Access is not Open in the modern sense of the word. The initial idealism in 2002 was great, but since then it’s become factional, cliquey and authoritarian in large part. Open Access is complex and needs serious public discussion but this is frequently shouted down. [I sat through an hour's plenary lecture at Digital Research from Stevan Harnad on "why the RCUK is wrong and must change its policy" with the subtitle "What Peter Murray-Rust thinks and why he is wrong". The views attributed to me were not mine and his conclusions erroneous, but he doesn't listen to me and many others. He has now mounted a public attack on RCUK. This will help no-one other than reactionary money-oriented publishers.]

I have been meaning to blog about Open Access for some time, but each time wonder whether I would do more harm than good. However I think it is now important to have proper public discussion about the serious issues, and Open Access Week may be an opportunity. As an example of the problem I find it very hard to find any centre to “Open Access” – who runs Open Access? What’s its purpose? Is there a consensus? Where can I expect to have a proper discussion without being insulted? Because if questions like this are not answered the movement (in so far as it *is* a movement) will surely fracture. And unless new coherent visions emerge then the losers will be academia but even more the SCHOLARLY POOR.

