Molecules in Wikipedia and RDF

I’ve just arrived in Mountain View for the Nature Google O’Reilly Foo Camp. Expect either silence (i.e. swamped) or gushings. I doubt I’ll have much time to blog. To keep readers happy here’s Egon making things happen in the world of RDF molecules.

Molecules in Wikipedia

I do not care about physical and chemical properties in Wikipedia, as I can easily extract them from other sources. The main value of Wikipedia for molecules is, I think, that it describes the history of a molecule. Additionally, the Wikipedia URL is a nice unique molecular identifier (for example http://en.wikipedia.org/wiki/Lactose) given certain conditions, and many bloggers are using it as such. But, it only is a useful identifier if one (and only one) InChI is stated on the wiki page.
Now that I am RDF-ing molecular space, I was again interested in dbpedia, a RDF version of Wikipedia. See these two blog items and Peter’s very nice dbpedia, RDF and SPARQL – for chemistry item. Christian is picking this up, and extending dbpedia for support for the various chemical boxes.
Wikipedia Templates
I have spotted a couple of templates: Drugbox, Chembox, Chembox new, of which the last one seems to most recent, and has extensions for explosives and drugs. The WikiProject Chemicals does not mention it though. Anyone who knows the status? Is chembox new the way forward and going to replace the older chembox? I hope so, because only the newer one has InChI in the last of official fields. Or is chembox new simply an extension of chembox itself?
Somewhere between 1000 and 1500 entries use the chembox new and another 1000 to 1500 use chembox but I assume there is considerable overlap. Additionally, Christian noted that there still seem to be molecules in Wikipedia which do not use a template at all, and counted some 1900 molecules using various lists. If you you want to keep a more close eye on chemistry in dbpedia, you should register to the dbpedia-discussion mailing list.

I think we are getting to critical mass in WP-Chem. There is a lot of good material and the templates/boxes are beginning to get formalised. Then it is fairly trivial to add more properties. (I disagree slightly with Egon – I think WP is the first place I would go to for properties of common substances.)
Here’s what you get if you search for the boiling point of barium. (I thank Jonathan Goodman for this adventure. The guild of barium boilers has been active over the centuries and given us – in Google order):
==========NobleMind (“more facts than you can handle”) ==


[ Science / Chemistry / Chemical Elements ]

 
The boiling point of Barium is 1640 º C

================WebElements================

Physical properties of barium

View… Cityscape Cylinders Line Balls Bar Intensity Virtual reality QuickDraw 3D
Melting point [/K]: 1000 [or 727 °C (1341 °F)]
View… Cityscape Cylinders Line Balls Bar Intensity Virtual reality QuickDraw 3D
Boiling point [/K]: 2143 [or 1870 °C (3398 °F)] (liquid range: 1143 K)

============ www.chemicalelements.com===========

Basic Information


Name: Barium
Symbol: Ba
Atomic Number: 56
Atomic Mass: 137.327 amu
Melting Point: 725.0 °C (998.15 K, 1337.0 °F)
Boiling Point: 1140.0 °C (1413.15 K, 2084.0 °F)
Number of Protons/Electrons: 56
Number of Neutrons: 81
Classification: Alkaline Earth
Crystal Structure: Cubic
Density @ 293 K: 3.51 g/cm3
Color: Silver
=========Periodic Table of Elements: Barium – Ba (EnvironmentalChemistry.com)======

Physical Properties of Barium

PMR: We’re not doing badly. Boiling barium is an eclectic art and it would never do to let outsiders in on the process. A range of 700 degrees is reasonable obfuscation.
So what does Wikipedia say? Is it authoritative? or does it simply copy the stuff above? I don’t know – it is meant to cite its references. It mentions Webelements, but it doesn’t have their value… it goes with the environmentalists

   
   
   
   
   
   
   
Boiling point 2170 K
(1897 °C, 3447 °F)

x
x
x
x
x
x
x
x
x
========================================================
So where would I look for authority? The NIST Webbook:
Phase change data
Go To: Top, References, Notes / Error Report
Data compilation copyright
by the U.S. Secretary of Commerce on behalf of the U.S.A.
All rights reserved.
Quantity Value Units Method Reference Comment
Tboil 1913. K N/A Strem Chemicals, 1999 dendritic phase
Tboil 1913. K N/A Strem Chemicals, 1999
======================================
And – read carefully – this supports ONE of the values above. Which?

This entry was posted in chemistry. Bookmark the permalink.

One Response to Molecules in Wikipedia and RDF

  1. Tony Hammond says:

    Hi Peter:
    I just posted a comment on Egon’s blog re the post that you’re citing here. He seemed to be unaware that InChI had a URI representation (and this is key for talking about molecules in RDF). Well, it does and I hope that this message is getting out to all chemists who need to bring chemistry onto the semantic web. There’s a couple references here:
    1. At Last! URIs for InChI
    http://www.crossref.org/CrossTech/2007/02/at_last_uris_for_inchi.html
    Also, as an example of InChI URI’s in use this post from RSC:
    2. RSC’s Project Prospect v1.1
    http://www.crossref.org/CrossTech/2007/05/rscs_project_prospect_v11.html
    Cheers,
    Tony

Leave a Reply

Your email address will not be published. Required fields are marked *