#solo10: Green Chain Reaction – cleaning the data and next steps

Scraped/typed into Arcturus

Reactions to the Reaction.

I have now normalized the chemical names of the solvents in the Green Chain Reaction and will be posting results for each of the years. There are some exciting and important points

  • I believe that everything we produce can be distributed under an OKD-compliant licence – we shan’t distribute actual patents.
  • The names have been normalized and sanitized in a two-step process. This is because we have two world-class chemistry resources which are Open – Wikipedia and Pubchem (http://pubchem.ncbi.nlm.nih.gov/). Pubchem was initially under great threat from vested interests and some of us fought publicly for its future – as a results it’s a thriving collection of essentially all known chemical compounds. By contrast Wikipedia has compounds “of interest” but the information is very detailed and usually very clean,. This means that we can tell whether a term is a chemical or not and what its properties are.
  • This means that the process can be completely automatic. The results below are the solvents mentioned in a subset of patents published in 2000. You can see there are effectively no false positives. (Some solids will be suffixed with “solution”, as in “urea solution”. ) You’ll also see the linguistic variants that have been used by the authors – for example the first solvent – dichloromethane is often referred to by its chemical formula.

It’s clear that we now have a useful resource “Open Solvents” and I will be creating tables for all of the years. However it is now a good time for use to think about collecting more data if we are going to answer the main question of the Green Chain Reaction – is solvent use becoming greener. For that we will need the amounts of solvent used as well.


<compounds>

<compound pubchemID=”6344″ wikipediaUrl=”CH2Cl2″ count=”115″><name count=”62″>CH2Cl2</name><name count=”29″>methylene chloride</name><name count=”24″>dichloromethane</name></compound>

<compound pubchemID=”887″ wikipediaUrl=”methanol” count=”44″><name count=”36″>methanol</name><name count=”8″>MeOH</name></compound>

<compound pubchemID=”962″ wikipediaUrl=”H2O” count=”36″><name count=”6″>H2O</name><name count=”28″>water</name><name count=”2″>hydrates</name></compound>

<compound pubchemID=”702″ wikipediaUrl=”ethanol” count=”33″><name count=”29″>ethanol</name><name count=”4″>EtOH</name></compound>

<compound pubchemID=”180″ wikipediaUrl=”acetone” count=”19″><name count=”19″>acetone</name></compound>

<compound pubchemID=”679″ wikipediaUrl=”dimethyl_sulfoxide” count=”19″><name count=”12″>dimethyl sulfoxide</name><name count=”7″>DMSO</name></compound>

<compound pubchemID=”176″ wikipediaUrl=”acetic_acid” count=”14″><name count=”14″>acetic acid</name></compound>

<compound pubchemID=”1049″ wikipediaUrl=”pyridine” count=”14″><name count=”14″>pyridine</name></compound>

<compound pubchemID=”3283″ wikipediaUrl=”diethyl_ether” count=”7″><name count=”6″>diethyl ether</name><name count=”1″>Et2O</name></compound>

<compound pubchemID=”8058″ wikipediaUrl=”hexane” count=”7″><name count=”7″>hexane</name></compound>

<compound pubchemID=”6212″ wikipediaUrl=”chloroform” count=”7″><name count=”7″>chloroform</name></compound>

<compound pubchemID=”8174″ wikipediaUrl=”1-decanol” count=”6″><name count=”6″>1-decanol</name></compound>

<compound pubchemID=”6342″ wikipediaUrl=”acetonitrile” count=”5″><name count=”5″>acetonitrile</name></compound>

<compound pubchemID=”6328″ wikipediaUrl=”methyl_iodide” count=”5″><name count=”4″>methyl iodide</name><name count=”1″>iodomethane</name></compound>

<compound pubchemID=”24458″ wikipediaUrl=”FeCl2″ count=”4″><name count=”4″>FeCl2</name></compound>

<compound pubchemID=”6134″ wikipediaUrl=”lactose” count=”3″><name count=”3″>lactose</name></compound>

<compound pubchemID=”6342″ wikipediaUrl=”MeCN” count=”3″><name count=”3″>MeCN</name></compound>

<compound pubchemID=”8761″ wikipediaUrl=”bicine” count=”2″><name count=”2″>bicine</name></compound>

<compound pubchemID=”7964″ wikipediaUrl=”chlorobenzene” count=”2″><name count=”2″>chlorobenzene</name></compound>

<compound pubchemID=”944″ wikipediaUrl=”HNO3″ count=”2″><name count=”2″>HNO3</name></compound>

<compound pubchemID=”6569″ wikipediaUrl=”methylethyl_ketone” count=”2″><name count=”1″>methylethyl ketone</name><name count=”1″>ethyl methyl ketone</name></compound>

<compound pubchemID=”280″ wikipediaUrl=”carbon_dioxide” count=”2″><name count=”2″>carbon dioxide</name></compound>

<compound pubchemID=”3776″ wikipediaUrl=”isopropanol” count=”2″><name count=”2″>isopropanol</name></compound>

<compound pubchemID=”957″ wikipediaUrl=”1-octanol” count=”1″><name count=”1″>1-octanol</name></compound>

<compound pubchemID=”1176″ wikipediaUrl=”urea” count=”1″><name count=”1″>urea</name></compound>

<compound pubchemID=”313″ wikipediaUrl=”hydrochloric_acid” count=”1″><name count=”1″>hydrochloric acid</name></compound>

<compound pubchemID=”14798″ wikipediaUrl=”sodium_hydroxide” count=”1″><name count=”1″>sodium hydroxide</name></compound>

<compound pubchemID=”24854″ wikipediaUrl=”CaCl2″ count=”1″><name count=”1″>CaCl2</name></compound>

<compound pubchemID=”516892″ wikipediaUrl=”sodium_bicarbonate” count=”1″><name count=”1″>sodium bicarbonate</name></compound>

<compound pubchemID=”222″ wikipediaUrl=”ammonia” count=”1″><name count=”1″>ammonia</name></compound>

<compound pubchemID=”284″ wikipediaUrl=”formic_acid” count=”1″><name count=”1″>formic acid</name></compound>

<compound pubchemID=”5943″ wikipediaUrl=”carbon_tetrachloride” count=”1″><name count=”1″>carbon tetrachloride</name></compound>

<compound pubchemID=”176″ wikipediaUrl=”AcOH” count=”1″><name count=”1″>AcOH</name></compound>

<compound pubchemID=”679″ wikipediaUrl=”dimethylsulfoxide” count=”1″><name count=”1″>dimethylsulfoxide</name></compound>

<compound pubchemID=”516892″ wikipediaUrl=”sodium_hydrogen_carbonate” count=”1″><name count=”1″>sodium hydrogen carbonate</name></compound>

<compound pubchemID=”6456″ wikipediaUrl=”trityl_chloride” count=”1″><name count=”1″>trityl chloride</name></compound>

<compound pubchemID=”3496″ wikipediaUrl=”glyphosate” count=”1″><name count=”1″>glyphosate</name></compound>

<compound pubchemID=”24480″ wikipediaUrl=”MnCl2″ count=”1″><name count=”1″>MnCl2</name></compound></compounds> ounds>

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to #solo10: Green Chain Reaction – cleaning the data and next steps

  1. What about adding the InChI too?

    • pm286 says:

      I have put the minimal information on to make it easy for human readers of the blog. The InChI, etc can all be got from Wikipedia once the URLs have been discovered and checked.

  2. Indeed. The reason I was asking, and earlier asking about where I can get the code to create XHTML+RDFa output is described in my blog:
    http://chem-bla-ics.blogspot.com/2010/09/pulling-out-data-as-json-from-xhtmlrdfa.html

Leave a Reply

Your email address will not be published. Required fields are marked *