Typed into Arcturus
We are making excellent progress. Some things go faster, some slower as always.
We now need a second round of volunteers. I’ll detail what we have done and what needs to be done this week. Most of the activity is completely open at http://okfnpad.org/solo10
Code:
Dan, Mark and Nava has forged ahead and shown that our document extraction framework can be run independently of OS and location. So far it:
- Downloads patents from weekly indexes
- Unzips them
- Converts the images to chemistry
- Restructures the main document
Todo
- Annotate the experimental sections and link up the chemistry
- Run chemicalTagger (POS+Chemistry tagger)
- Collect and upload solvent data
Data:
At the moment we are concentrating on patent data. The data is messy but tractable. We shall very soon be able to distribute code and data. We’ll be looking for volunteers who can run this on their local machines and then upload it.
Currently we are looking at RDF for managing simple solvent and temperature data on reactions. Something like (pseudocode):
ExperimentURI/UUID
- Has Temperature (hasUnits)
- Has Solvent
- Has Duration (hasUnits)
- Has Amount (hasUnits)
SolventURI/UUID
- Has formula
- HasWikipediaEntry
-
hasPubchemCID
Reaction specification:
Not much progress (blocked on PMR). We can probably analyse solvents without complete semantics for the reaction, but it would be nice to try. Thanks to mat and Jean-Claude for their patience
Making documents open:
Progress in the background between Heather Piwowar and PMR. Currently blocked on PMR.
Resources and help wanted
- Comments on the above welcomed
- Where can the data be reposited? We’ve had one offer. At present we’d like this to have an upload server for triples. Anyone with a triple server much appreciated
- Help with analysing the results. Mainly descriptive stats, we hope.
- Help with running the patent downloads and conversions (at least 10 volunteers wanted)
More later…