I/we had a great evening on Wednesday at the Open Science meeting run by Jenny Molloy and colleagues http://blogs.ch.cam.ac.uk/pmr/2013/07/24/hack4ac-content-mining-and-open-science-in-oxford/ . I was leading the meeting on “content mining” and we had about 12 attendees including bioscientists, librarians, physicist, informatics, etc. It was very informal and we started by talking abour our own interests and then I gave some demos and introduction on content mining.
Jojo Scobie @paraphyso took this picture of Chuff @okfn_okapi in the pub
I was delighted to see the interest and involvement of the group in phylogenetics. At least half could be described as having a significant interest or practice in the area. So we were able to look in depth at the sort of science published and to explore the issues, both technical and organizational. And they were forgiving of my ignorance and spent a long time educating me!
I’ve discussed much of the basis before, but in essence Ross Mounce and I will be extracting data from PDF publications and systematically publishing it. We looked at the things we would like to extract. There was an important discussion on whether extracting the single tree from a paper was valuable – the authors should publish a much fuller amount of data so 1 tree isn’t always a good representation of the result. But we generally agreed it was a lot better than zero.
We discussed the value of indexing the literature by species and here there was great agreement – if the scientific literature were indexed by species (and possible geodata and dates as well) that would be really valuable – and it’s technically about the simplest example of high-quality content-mining.
Our species are in danger see http://www.zsl.org/conservation/regions/africa/okapi/ which reports “The workshop highlighted that the okapi is faring worse than scientists previously thought”. And there are 100+ species in the highly critical list. Finding all the published information on species is an essential (but not sufficient) activity and it should be possible for anyone anywhere in the world to get all the peer-reviewed and grey literature on a species. Content-mining is a necessary approach.
There is obviously a critical mass of interest in expertise in Oxford – supported by Jenny’s tireless efforts. We proposed we should have a hackathon on “species” – we could make a lot of progress.