The value of text-mining

In response to my concern about access to the full text in PubmedCentral the Blog Suicyte Notes questions the value of text-mining:

I cannot think of a single example where text-mining has ever made a major contribution to solving any real-life biomedical problem. Even if there are such eamples, their number will be small. If we compare the health benefits from text mining efforts to those provided by real (human) scientist reading the literature, I have no doubt that the latter would prevail by a big margin.
There should be no doubt about it, it would clearly be a good thing to enable text-mining on PMC. However, describing the current situation of free access to PMC papers for scientists as useless without added text-mining capabilities appears to be, well, kind of biased.

PMR: I actually said “desperately impoverished”, not “useless”, to which I stick. The post has generated a series of comments on Suicyte which are worth reading and generally highly supportive.
From my own experience the average bioscience paper is incredibly difficult to read. The terminology is arcane and in places bizarre. What does “hedgehog” mean? You and I might think it was a spiky mammal, but actually it’s a gene and signaling pathway). (The drosophilia community delights in using amusing names – such as “clueless” for their genes – other communities use opaque abbreviations/acronyms such as BRCA1 and RAD5.) And there is the chemistry – how many readers know what “epibatidine” is? So a very simple, quick, extremely valuable lighweight use of text-mining is to annotate papers for easy reading.
This annotation and republication is forbidden by most publishers. Many have no interest in making papers easy to read or use – it costs money. So, through our OSCAR software, we could – if we were allowed – annotate the chemistry in most of the world’s literature. We are forbidden to do so.
Even this lightweight annotation would be an enormous boon to science. But fulltext-mining goes way beyond that. The bisoscience literature is full of observations that are either not explained or are later revised. Machines play a major role in trying to help us understand this mass of science. I’m not claiming that machines can replace humans – the human-to-human communication in most papers is so esoteric and unsemantic that it’s currently impossible. [If we had semantic authoring things would be different.] But when machines do those bits that humans hate – searching, linking, resolving synonyms, etc. then human productivity is vastly increased. To the extent that we undertake new things as we are freed from the boring stuff. One example from the comments (Lars Jensen):

3) There are actually cases where text data mining was used to make discoveries of direct medical relevance. The most famous examples are the links between Raynaud syndrome and fish oil and between migraine and magnesium deficiency.

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to The value of text-mining

  1. Glen Newton says:

    Peter,
    The Blog Suicyte Notes is so completely clued-out. One good example (of many) – in the Economist and reported in one of my blog entries (FREE THE ARTICLES! (full-text for researchers & scientists and their machines)) shows how researchers were able to figure-out the biochemical pathway of addiction by only going through the literature (~1000 studies).
    -Glen

  2. pm286 says:

    (1) Many thanks Glen. This is exactly the sort of thing I need.

Leave a Reply

Your email address will not be published. Required fields are marked *