We are now describing our workflow from extracting facts from the scientific literature on http://contentmine.org/blog . Yesterday Ross Mounce and I hacked through what was necessary to extract species from PLoSone. Here’s the workflow we came up with:
Ross has described it in detail at http://contentmine.org/blog/AMI-species-workflow and you should read that for the details. The key points are:
- This is an open project. You can join in; be aware it’s alpha in places. There’s a discussion list at https://groups.google.com/forum/#!forum/contentmine-community . Its style and content will be determined by what you post!
- We are soft-launching it. You’ll wake up one day and find that it’s got critical mass of people and content (e.g. species). No fanfare and no vapourware.
- It’s fluid. The diagram above is our best guess today. It will change. I mentioned in the previous post that we are working with WikiData for part of “where it’s going to be put”. If you have ideas please let us know.
Pingback: How contentmine will extract millions of species – ContentMine