Scraped/typed into Arcturus
We have now scoped out the project for the Green Chain Reaction and I am almost certain it can work. How well it works depends largely on the number and time of the volunteers.
If you are interested in taking part let us know now … it should be fun and it doesn’t matter if any particular person or machine doesn’t succeed. BE BRAVE!
The code now works at the following levels:
- It takes a weekly patent index and downloads all the chemical patents.
- It trawls through these to see which contain experimental sections
- It analyses the text in these to extract mentions of solvents, including chemical formula and amount (where given)
- It aggregates all the solvent data from a single patent into a summary file (dissolveTotal.html)
I am hoping that we can add company and country information to disoolveTotal.html but this is not critical – just fun.
Volunteers should read http://okfnpad.org/openPatents and email me (pm286 a t cam dot ac dot uk). I will put them on a communal mailing group so they can mail for help. Each volunteer will select a patent Index (there are about 1500 – 30 years at 50 weeks). It takes about 30-60 minutes to download, unzip and analyse a week’s material. It generates about 200 MB and 12000 files. Only about 50 of these will need to be (automatically) uploaded.
So for the whole group in total about 1000 hours’ work and 12 million files. It’s this job we want YOU to help with.
The dissolveTotal.html is quite small – a few Kb and there are perhaps 20/patentIndex and it’s this you will upload to our server. When we get a significant number of these we can then start using our new software to analyse and display the results.
We’ve currently got about 8 people who can run these jobs. That’s quite a lot of effort per person – 200 jobs. So if we get more volunteers it will make it more fun. Of course we don’t have to do the whole lot, but it’s a fun challenge.
We’ll probably make two runs at the data as the parser needs tuning in respect of what we see.
Please join in…