Chemical textmining – 2

In a previous post ( Text-mining at ERBI : Nothing is 100%;) I asked the readership to suggest how many chemical entities there were in a given paragraph. I intended to your replies – and comments – to help clarify some of the issues in chemical textmining and make clear what some of the essential procedures are. (We don’t believe everybody does it sufficiently rigorously, though it’s difficult to tell when the methods aren’t public).
There are no tricks. Just to clarify, we are describing the sort of activity that a bio-/chemical indexing organisation might undertake. So the following sentences contain one chemical entity each:

  • all spectra were run in CDCl3.
  • benzene melts at 5 deg. C
  • the flask was flushed with He.

and the following do not:

  • She fed the dog
  • He fed the dog
  • The cat is neither alive nor dead

So please revisit the post (Text-mining at ERBI : Nothing is 100%) and add a comment, minimally a number between 4 and 11. You don’t have to give your name.
Update:
Some people have problems commenting on this blog.
Chemspider has posted (How Many Chemical Entities in a Paragraph of Text) a detailed and thoughtful estimate (10) on his blog. Do you agree?

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Chemical textmining – 2

  1. Martin Griffies says:

    Peter –
    http://www.nactem.ac.uk/workshops/lrec08_ws/slides/Kolarik_et_al.pdf
    This is from the LREC conference on text-mining, a week or so ago, and may be of interest.
    SCAI / Fraunhofer will also host a symposium on scientific TM later this year, (August?) although it’s somewhat of a cover for SCAI’s commercial activities with InfoChem and Temis, with other commercial organisations banned.

Leave a Reply

Your email address will not be published. Required fields are marked *