Chemistry and Wikipedia

PMR: Recent comments on this blog about derivative works of Wikipedia

  1. Physchim62 Says:
    May 15th, 2008 at 6:38 pm eI would be very interested in having a chat with you about this project, either here or by email. Although I am somewhat more pessimistic than you about the possibilities of avoiding human input, I might be able to help you to avoid rediscovering some of the problems we have had on Wikipedia!

PMR: I am certainly not against human input – it’s essential. But I think machines can help in semantic integrity.

  1. Martin Walker Says:
    May 15th, 2008 at 6:47 pm eAs you know, we (the Wikipedia chemists) are organising our lists as I write this, so things are in a state of flux. There are two lists currently available. of 4743 contains nearly all the organics plus a few inorganics, and then we created another smaller . There is some duplication, which will disappear once they are merged. Missing from the list are a few such as skatole (unknown reasons!), plus other articles like 2-methylpyridine that are new. If you would like this information in database format, we are now in a position to provide it, though it will lack the validated CAS numbers at this time. As you know, we welcome mashups and other collaborations.Martin A. Walker (walkerma on Wikipedia)

PMR: Many thanks. I’d be grateful for it in database format. (Some of you may not realise how much effort Martin Physchim and other put into Wikipedia). More comments below.

  1. Physchim62 Says:
    May 15th, 2008 at 5:55 pm eAt Wikipedia, we are currently involved in a VERY similar process to the one you’re describing here, except we are less optimistic as to the possibilities of automation. Given our somewhat eclectic range of compounds, we are more than used to the fact that many fundamental data are simply not known. To take one (extreme) example, have a look at , where we give virtually all of the publically available information on this compound.While I would not wish to discourage your group, I must say that, at Wikipedia, we have found that the most valuable “semantic chemical authoring tool” is a human chemist: personally, I charge less for consultancy than CAS charges for access to its databases (but maybe that’s my mistake!) Much chemical information, on the web and on paper, is false, and most of it lacks the necessary metadata to be able to judge its veracity. THAT, I feel, is the real problem!

PMR: I fully agree – and although I haven’t contributed compounds I have created chemical entries in WP. But as the numbers scale it will benefit from semantic tools.
I’m very keen to use DBPedia and at the moment this suffers from inconsistency in the semantics. Here’s a few examples of <http://dbpedia.org/property/molarmass>:

:%2528Benzylideneacetone%2529iron_tricarbonyl/section2/Chembox_Properties [http]    286.06
:1%252C1%2527-Bi-2-naphthol/section2/Chembox_Properties [http]    “286.32 g/mol”@en
:1-Aminocyclopropane-1-carboxylic_acid/section2/Chembox_Properties [http]    “101.1 {{Ref_N”@en
:2%252C2%252C2-Trichloroethanol/section2/Chembox_Properties [http]    “149.40 g mol<sup>-1</sup>””@en
:3%252C3%2527-Diaminobenzidine_tetrahydrochloride/section2/Chembox_Properties [http]    “360.11 g/mol<BR> 396.14 g/mol (dihydrate)””@en
:Alginic_acid/section2/Chembox_Properties [http]    “10000 – 600000″@en
:Aluminium_chloride/section2/Chembox_Properties [http]    “133.34 g mol<sup>-1</sup> (anhydrous) 241.432 g mol<sup>-1</sup> (hexahydrate)””@en
:Aluminium_sulfate/section2/Chembox_Properties [http]    “342.15 g/mol as anhydrous salt”@en
:CS_gas_%2528data_page%2529/section2/Chembox_Properties [http]    “188.6 g/mol Documents the chemistry of CS and its effects on the body.”@en
:Calcium_chloride/section2/Chembox_Properties [http]    “110.99 g/mol, anhydrous 147.02 g/mol, dihydrate 182.04 g/mol, tetrahydrate 219.08 g/mol, hexahydrate””@en

PMR: Here we can see the variability in the reporting of the physical quantity and the units associated with it. When this is normalised then Wikipedia/DBPedia will become a stunning chemical resource. Our offer is to try to see if we can normalise this outside WP so it can be re-inserted.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *