Why do we continue to use Citations?

I have just got the following mail from Biomed Central about an article we published earlier this year (edited to remove marketing spiel, etc.)

Dear Dr Murray-Rust,

We thought you might be interested to know how many people have read your article:

ChemicalTagger: A tool for semantic text-mining in chemistry
Lezan Hawizy, David M. Jessop, Nico Adams and Peter Murray-Rust
Journal of Cheminformatics, 3:17   (16 May 2011)

Total accesses to this article since publication: 2117

This figure includes accesses to the full text, abstract and PDF of the article on the Journal of Cheminformatics website. It does not include accesses from PubMed Central or other archive sites (see http://www.biomedcentral.com/info/libraries/archive). The total access statistics for your article are therefore likely to be significantly higher.

Your article is ‘Highly accessed’ relative to age. See http://www.biomedcentral.com/info/about/mostviewed/ for more information about the ‘Highly accessed’ designation.

These high access statistics demonstrate the high visibility that is achieved by open access publication.

I agree. It does not, of course, mean that 2117 people have read the whole article. I imagine it removes obvious bots. Of course there could be something very compelling in the words in the title. After all (http://blogs.ch.cam.ac.uk/pmr/2011/07/08/plos-one-text-mining-metrics-and-bats/ ) the word “bats” in the title of one PLOSOne paper got 200,000 accesses (or it might have been “fellatio” – I wouldn’t like to guess). So I looked up “tagger” in Urban Dictionary and its main meaning is a graffiti writer. Maybe some of those could use a “chemicaltagger”? But let’s assume it’s noise.

So “Chemicaltagger” has been heavily accessed and probably even read by some accessors. Let’s assume that 10% of accessors – ca 200 – have read at least parts of the paper. That possibly means the paper is worth something. But not to the lords of the assessment exercise. Only the holy citation matters. So how many citations? Google Scholar (using its impenetrable, but at least free-if-not-open system) gives 3. Where from? Well from our prepublication manuscripts in DSpace. If we regard these as self-citations (disallowed by some metricmeisters) we get a Humpty sum:

3 – 3 = 0

So the paper is worthless.

If we wait 5 years maybe we’ll get 20 citations (I don’t know). But it’s a funny world where you have to wait 5 years to find out whether something electronic is valued.

So aren’t accesses better than citations? After all don’t we use box office receipts to tell us how good films are? Or viewing figures to tell us the value of a program? ["good" and "value" having special meanings, of course]. So why this absurd reliance on citations? After all Wakefield got 80 citations for his (retracted) paper on MMR Vaccine and autism. Many were highly critical. But it ups the index!

The reason we use citations as a metric is not that they are good – they are awful – but that they are easy. Before online journals the only way we could find out if anyone had noticed a paper was in the reference list. Of course references can be there for many reasons – some positive, some neutral, some negative and many completely ritual. They weren’t devised as a way of measuring value but as a way of helping readers understand the context of the paper and giving due credit (positive and negative) to others.

But, because academia is largely incapable of developing its own system of measuring value, it now relies on others to gather figures. And pays them lots of money. Citations are big business – probably 200->1000 million USD per year. So it’s easier for academia to pay others precious funds. And all parties have a vested interest in keeping this absurd system going. Not because it’s good, but because it saves trouble. And of course the vendors of citation data will want to preserve the market.

This directly stifles academic research in textmining of typed citation data (i.e. trying to understand WHY a citation was provided). Big business with lawyers (e.g. Google) are allowed to mine data from academic papers. Researchers such as us are forbidden. Because bibliometrics is a massive business. And any disruptive technology (e.g. Chemicaltagger, which could also be used for citations) must be prevented by legal means. And we have to deprecate access data because that threatens to holy cow and holy income of citation sales.

The sooner we get academic texts safely minable – in bulk – the sooner we shall be able to have believable information. But I think there are many vested interests who will be preventing this. After all what does objectivity matter?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>