In the next few posts I shall address some common myths about Content Mining (TDM). Many are implicitly or explicitly put to by Toll-Access Publishers (TAPublishers).
The most serious myth is that it’s not important.
Actually it’s important to everyone. The two major information successes of the first decade of this century were both content-mining:
- Google has systematically mined the Open Web using machines and added its own semantics
- Wikipedia has systematically mined the info sphere using humans and added its wn semantics.
If you have ever used Google or ever used Wikipedia then you have used the results of content-mining.
Wikipedia is beyond criticism – if you are unhappy about it, get involved and change it. But what about Google.?
Well Google doesn’t do science.
If I want to know what species was recorded in this place at that date; or what chemical reaction occurred under these conditions, then Google doesn’t help. You need a semantic scientific search engine.
Discipline-based Semantic content mining is the most important development in applied information science. If you want to build the library of the future you should be doing this – not paying rent to third parties. If you want to do multidisciplinary research you need the results of content-mining.
If we were allowed to do it, then I wouldn’t be wring this blog post. As it is, the TAPublishers are fighting tooth-and-nail to stop us content-mining. People are doing it but in secret. Because if they do it in public, then they will be cut off or sued. It’s not surprising that we don’t yet have high visibility.
But that’s going to change. And change rapidly. We have literally billions of dollars of information locked up in the current scholarly literature. And 10000 papers come out each day. We need content mining to manage these – read them for us. Organize them. Let us search after we’ve read them. Do some of our routine thinking for us.
On our own terms for our own needs.
It can happen, just as Wikipedia happened.
So don’t turn away – believe that Content Mining matters – matters massively.
It is beyond my comprehension how in general scientists/academia do not realize how content mining is THE key information technology. How could anyone not want access to the many perks, which you’ve laid down often enough in this blog? It’s like saying you don’t want the internet, because you have the Encyclopedia Britannica right there sitting on your shelf(shelves). Oh yes, fear. Fear to lose funding or standing, or of having to adapt to change. Protectionism, as above, so below. “It doesn’t matter to me” equates to “I don’t want this to matter to me”. Why? Because to remain in a blissfull state of ignorance and denial is usually much more comfortable than acknowledging some truths that unfortunately collide with dearly held illusions.
Well, not forever. When the time comes, I believe it will spark an era of unprecedented technological advances. After a while, noone will want to go back to the olden days. And the reactionaries? Dust, for the history lessons of future generations.
Thanks – it’s great to have others who share my thoughts.
Is Wikipedia crowd-sourced content mining as you imply or is it more like crowd-sourced editorialising? Personally, I’m not really sure but attributing the success of Wikipedia to content mining somehow doesn’t seem right.
When I write (contributions to) pages on WP I go and find authoritative sites and extract info. maths equations, chemistry, references, etc. This is content mining. Then comes the annotation and editing