MOTSI: What is a citation?

Posted on July 18, 2011 by pm286

We are all now judged by citations? But what *IS* a citation? It’s not easy to answer… and it may not be quite what you think. http://en.wikipedia.org/wiki/Citation gives:

Broadly, a citation is a reference to a published or unpublished source (not always the original source). More precisely, a citation is an abbreviated alphanumeric expression (e.g. [Newell84]) embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose of acknowledging the relevance of the works of others to the topic of discussion at the spot where the citation appears. Generally the combination of both the in-body citation and the bibliographic entry constitutes what is commonly thought of as a citation (whereas bibliographic entries by themselves are not).[PMR’s emphasis]

A prime purpose of a citation is intellectual honesty: to attribute prior or unoriginal work and ideas to the correct sources, and to allow the reader to determine independently whether the referenced material supports the author’s argument in the claimed way.

And

Bibliographies, and other list-like compilations of references, are generally not considered citations because they do not fulfill the true spirit of the term: deliberate acknowledgment by other authors of the priority of one’s ideas.

info%3Adoi%2F10.1371%2Fjournal.pone.0020181 (Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry, BalaKrishna Kolluru¹ ^*, Lezan Hawizy², Peter Murray-Rust², Junichi Tsujii¹, Sophia Ananiadou¹ )we find a typical context in which the citation occurs.

Different aspects [of text-mining] such as named entity recognition (NER), tokenisation and acronym detection require bespoke approaches because the complex nature of such texts [1]–[5].

The [1]-[5] represents five elements of prior work which are defined by the language. This “sentiment” is very difficult to analyse exactly by machine and requires a human to describe the type of the citation. In this case it is prior work in the field. Here are the resolved bibliographic references:

 Kemp N, Lynch M (1998) Extraction of information from the text of chemical patents. 1. identification of specific chemical names. Journal of Chemical Information and Computer Sciences 4: 544–551. Find this article online

 Murray-Rust P, Rzepa H (1999) Chemical markup, xml, and the worldwide web. 1. basic principles. Journal of Chemical Information and Computer Sciences 39: 928–942. Find this article online

 Murray-Rust P, Mitchell J, Rzepa H (2005) Chemistry in bioinformatics. BMC Bioinformatics 6: 141. Find this article online

 Banville D (2006) Mining chemical structural information from the drug literature. Drug Discovery Today 11: 35–42. Find this article online

 Kolrik C, Hofmann-Apitius M, Zimmermann M, Fluck J (2007) Identification of new drug classification terms in textual resources. Bioinformatics 13: 264–272. Find this article online

Note that the references themselves are NOT citations, it is the combination of each of them with the context (sentence (I)) that defines the citation. This is important as most “citations” do not fulfil this criterion. Note also that two of the references are to works authored in part by one of the authors (PMR); these , when expressed as citations, are sometimes called “self-citations”. (But note that several authors are involved in each case). These citations – in the first paragraph of the paper – can be assumed to reference fairly important prior work on which the paper probably builds

These citations can be seen to lend some merit to the work described by the references. But it is because of the complete citation that we lend the merit – not just because the references are in the paper. In using citations, therefore, we should always include the sentiment. There are many types of sentiment – and we looked at this in our Sciborg project. (http://acl.ldc.upenn.edu/W/W06/W06-1312.pdf )

Each citation is labelled with exactly one category. The following top-level four-way distinction applies:

Weakness: Authors point out a weakness in cited Work
Contrast: Authors make contrast/comparison with cited work (4 categories)
Positive: Authors agree with/make use of/show compatibility or similarity with cited work (6 categories),

and

Neutral: Function of citation is either neutral, or weakly signalled, or different from the three functions stated above.

Some of these might count positively to an author’s reputation, others would be negative.

Here’s a similar assessment in botanical systems http://image.sciencenet.cn/olddata/kexue.com.cn/upload/blog/file/2010/12/2010128124335464925.pdf . Typical extract:

In 2007, Stephen McLaughlin published “Tundra to Tropics: The Floristic Plant Geography of North America” in Sida, Bototanical Miscellany. McLaughlin is one of the few authors who included data sources in his work. He stated that “The 245 local floras selected for this study are listed in Appendix A” (p. 3). Although listed in an appendix, they are not included in the bibliography. Instead, his bibliography consists of 28 other publications, mostly books and articles in books, but also articles in Thomson Reuters-monitored journals.

In other words flora (which are citable) may be excluded from “citation” analysis” because the authors put them elsewhere in the document. Automated methods of “citation analysis” cannot pick this up. In some of my own work many references may be in tables.

So a true citation carries sentiment and describes the purpose of the citation. In #jiscopencite (sister to #jiscopenbib) David Shotton and colleagues are developing a citation typing ontology (CITO).

But, unless you tell me different, the “citation” used in current metrics and in the Science Citation Index and modern descendants is in fact only a bibli0ographic reference. It carries no sentiment. Which is why citation counts are skewed by negative sentiment. And some types of common citation (methods or software) can achive very large counts.

None of this is included in the Journal Impact Factor, which AFAIK simply extracts bibliographic references. A person can get increased “citations” for being criticized. Being controversial may increase your metrics. None of this is surprising, other than to those who think the evaluation of research can be automated.

There are many other reasons why “citations” (bibliographic references) are seriously flawed. I’m not the first to bring these up

:Error in the bibliographic references themselves. http://www.dlib.org/dlib/march09/canos/03canos.html shows “an author named “I. INTRODUCTION” has published hundreds of papers. Similarly, according to CiteSeer, the first and third authors of this article have a coauthor named “I. Introducción”. This makes us laugh, but sickly, in that our careers are based on this level of inaccuracy. There are many other problems, author identification and disambiguation (ORCID may solve some, but not all of this). #jiscopencite reveals that many bibliographic references are simply inaccurate. José H. Canós Cerdá, Eduardo Mena Nieto, and Manuel Llavador Campos continue: Citation analysis needs an in-depth transformation. Current systems have been long criticized due to shortcomings such as lack of coverage of publications and low accuracy of the citation data. Surprisingly, incomplete or incorrect data are used to make important decisions about researchers’ careers. We argue that a new approach based on the collection of citation data at the time the papers are created can overcome current limitations, and we propose a new framework in which the research community is the owner of a Global Citation Registry characterized by high quality citation data handled automatically. We envision a registry that will be accessible by all the interested parties and will be the source from which the different impact models can be applied
Lack of transparency in what sources are used. What is, and is not, citable – the target of a bibliographic reference. Web pages? Broadcast talks? Our media changes if we wish to use it for evaluation, WE, not the unaccountable commercial organizations should be in control.
Lack of sentiment (above). Without knowing why something is cited we cannot attribute motivation and value.

For me the biggest problem is the lack of transparency – if this problem is addressed, the other two follow.

In http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/522/443 (14 years ago) Cameron suggests:

A universal citation database has significant potential to act as a catalyst for reform in scholarly communication by leveling the playing field between alternative forms of scholarly publication. This would happen in two important ways. First, the citation database would ensure that publications in any form are equally visible (but not necessarily equally accessible) to the literature research process. Regardless of which publication venue an author chooses, all that she/he need to do to make her/his work visible is to cite appropriate previous works. Publication venues would then compete on the important values that they bring to the publication process, such as refereeing standards, editorial control, quality of presentation, timeliness of dissemination and so forth. Publications would no longer enjoy an unfair competitive advantage simply by virtue of being indexed in a particular literature database.

The second way that a universal citation database would promote fairer competition among publication venues is by providing a method for evaluating the significance of individual papers independent of the publication venue chosen. University faculty members are often critically concerned with the recognition that their work receives because of its importance to the evaluation of their academic careers. Because the significance of papers is often judged solely by the perceived quality of the venues in which they are published, this encourages a very conservative approach to choice of publication venues. By providing citation data as an independent means of demonstrating the significance of a particular work, a universal citation database has the potential to encourage authors to choose publication venues for other qualities.

And the implication was clear – academia could and should have initiated this. As I have implied elsewhere academia has sleepwalked past this opportunity and now generated a mass of unregulated bibliographic reference collections. Their quality and coverage is not transparent so I cannot judge them – other than to say that non-transparency has generally little value.

So – as in so much else – IF we created semantic publications, and IF we made them Open, many of the problems would be solved. We should use semantic bibliography (as we have developed in our Open Bibliography project). We should label that with sentiment to create true citations. These should be semantic and published Openly at time of first publication. This would solve most of our problems of bad data, missing data, control by unaccountable third parties, etc.

But it would require the author to change their habits. To adopt new and better ways of authoring papers. And there is an established industry – including academia – who benefit from the low quality processes we currently have.

We are constantly told

“Authors will never do that”

And that’s true IF, but only IF, academia doesn’t care.

Let’s try the following:

“scientists will never asses the safety of their reactions. It’s too much trouble”

“scientists will never bother to report experiments on animals. It’s too much trouble”

So IF academia required scholarly publications to have semantic accurate citations (not just bibliographies) we could solve this in months. The technology is not the problem.

Academics are the problem.

We are in the grip of being controlled by our own creations we cannot control. In this case our Monster of the Scholarly ID is the “citation”. Let’s tame it.

Posted in Uncategorized | Leave a comment

Journal review system: a reviewer’s perspective

Posted on July 17, 2011 by pm286

Quite by chance I have just received an update of a review I did for [a gold open access scientific journal]. I omit all confidential info:

Dear Dr. Murray-Rust,

Thank you for your review of this manuscript. The Editor has made a decision on this paper and a copy of the decision letter can be found below.
You can also access your review comments and the decision letter by logging onto the Editorial Manager as a Reviewer.

[Dear Author… ]

Before your manuscript can be formally accepted, your files will be checked by the [publisher’s] production staff. Once they have completed these checks, they will return your manuscript to you so that you may attend to their requests and make any changes that you feel necessary.

To speed the publication of your paper you should look very closely at the PDF of your manuscript. You should consider this text to have the status of a production proof. Your paper will be tagged and laid out to produce professional PDF and online versions. However, the text you have supplied will be faithfully represented in your published manuscript exactly as you have supplied it.

So as far as the author and reviewer are concerned everything is driven by PDF (confirming Cameron’s experience). PDF is a well-known destroyer of semantic information. This, of course, is common to all publishers. We have allowed them to create this monster and force it on us.

PDF holds back the development of semantically supported science.

Posted in Uncategorized | 5 Comments

The destruction of semantic data: The PLoS community replies

Posted on July 17, 2011 by pm286

I posted yesterday about an article in PLoS ONE where I criticized the author/editor/publisher for destroying semantic data. /pmr/2011/07/16/how-to-share-data-and-how-not-to/ It has generated 11 replies and you should read them before this post so as to get all points of view.

It turns out it is the PLoS system that carries out this transformation. There are several vigorous defences of PLoS so I will try to be objective.

The first, and IMO fundamental, concern is that this is a system which (however good or bad) is developed by the publisher and thrust on authors, reviewers and readers/re-users. That is true of almost all publishers and it is one of the MOTSIs – that we have handed over to publishers the representation of our knowledge. In the print era this might have been acceptable but in the century of the semantic web I find it inexcusable. PLoS is no worse than others and because it’s Open it exposes its XML (closed publishers do not, of course do this).

What I have objected to is that the information submitted by the authors is transformed (in my case without my knowledge or consent) to a dumbed down version. Here is an example (from our paper on text-mining http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0020181#pone-0020181-t001 );

What we submitted: (I don’t like submitting images but sometimes it is the only way

It’s quite readable to young sighted humans. Now here is the “Powerpoint-friendly” version:

Cameron (below) argues that most “readers” will want to display the included material as a slideshow. But this is systematic destruction of the information (by reducing the resolution when it wasn’t necessary).

I’ll take Cameron and others replies and comment – as objectively as I can. BTW I use “PLoS” because we can expect some answers, but the arguments below are generic (they may differ in detail from publisher to publisher).

>>> FWIW the review process is done entirely via PDFs so it is not straightforward to tell what the native format of any part of the paper is for the reviewer. I would agree that this is bad but its consistent with most of the journal systems I’ve worked with. Obviously this makes reviewing for data formats difficult or impossible but what’s probably worse is that it discourages you from asking the question. In some ways the BMC system is better in this respect because the files are there in front of you with big icons saying what the format is.

PMR>> exactly so. The reviewers have to accept what the journal thrusts on them. It’s impossible for them to get close to the data. So this is a publisher-enforced policy that hinders the publication of semantic information. Not what the data *is* but what it looks like. How many reviewers might like to cut and paste (or better) the authors’ data into their own data analysis tool? This is true for most publishers – they send reviewers the PDFs because it’s easier for the publishers. IMO this leads to poorer science because it’s impossible for the reviewers to have access to the data (even if submitted)

>>>However I think I disagree with Peter about the destruction element here. The html version of the paper is explicitly designed in the PLoS system for human reading (admittedly by sighted people). I actually find that floating window and the ability to click through figures very useful and I’d imagine that it makes that process simple if everything, figures and tables, are the same format. Given that the tabular data is available in the XML [PMR I’ll address that later], which is where you’d go to dig out data, I don’t think its a question of destruction but of differing priorities.

PMR>> OK, who determines the priorities. Not the authors, although they pay PLoS for the publication. The reviewers?? The editors?? Or PLoS management.

>>>The person who wants to cut and paste the numbers from the table is going to be annoyed but the person who wants to grab the figure and drop it into a presentation is going to be happy. And I suspect the latter may be the more common re-use case.

PMR>> I am surprised. I would have thought that many readers actually want to have access to the data – in data form – on their machine. I don’t spend my time presenting other people’s published material as parts of slide shows, but maybe I am the exception. Where I do I would not drag-n-drop an unreadable Powerpoint friendly table – I would create it is a form where the audience could read the most important bits. Maybe I would have to do some editing and cropping…

>>> The ideal would obviously be to have both, contextually presented depending on what the user (human or machine) wants. PLoS have focussed very hard on making their html rendering attractive to human readers and have as a result pulled into a situation where html downloads are much greater than pdf downloads which I would see as a good thing.

PMR>> I would agree that HTML is far better than PDF and HTML5 is better than HTML and Scholarly HTML should be what we aim for.

>>. The price, with limited resources, is things like this which are suboptimal obviously.

PMR: This I fail to see. If you already have tables marked up in XML it’s trivial, yes really trivial, to convert them to HTML tables. It would take me 30 minutes to write a stylesheet to extract the tables and translate them to HTML (trhey are effectively that already). And putting the links in to the HTML shouldn’t be rocket science

>>>What would a system look like that achieved all of these goals – presenting the easily cut and pasted whole for those who wanted it, plus the cut and pasted data for the humans who want that, plus the marked up data for those who want that? At least there’s a DOI for each element so a content negotiation scheme would in principle be possible. It also re-raises the question of standardising the form in which a paper points to its data on an external service such as Dryad – how should that link be made machine discoverable in a general way?

PMR>> exactly. My concern was that by turning semantic tables into images the publisher(s) give the impression they don’t care about data. BMC (to pick another Open Access publisher) does care about data. So should PLoS

Andy Turner>>> It is easy to find the XML for the table in the article XML

PMR>> yes – and I found it. J It gives no explanation on what it is, how to use it, whether you need special tools, etc.

AT>>>and it has an XLink so there is perhaps really very little to find issue with about this.

PMR>> The Xlink in the XML points to an IMAGE (see mimetype). And that’s what I take issue with.

AT>>> Perhaps the enhancement wanted is to add buttons for XML (small, medium and large) i.e. XML Table Values Only, XML Table with Metadata and context links, XML for the article. Perhaps also there could be a download package for all this as zip, tar.gz etc…

PMR>> Exactly. This would be a big enhancement. And if PLoS and BMC and EGU and IUCr and… (maybe even some closed access publishers) all used the same approach it would solve the problem. Because which reader or re-user wants a different approach to each publisher?

QUESTION. Yes, I can find the XML and – because I understand XML – I can locate the tables and I can write a stylesheet to extract them. But most people can’t. Is there something I’m overlooking? An open set of tools that everyone except me has access to? Or is it actually cutting and pasting each individual field out of the XML?

Posted in Uncategorized | 4 Comments

How to share data and how not to

Posted on July 16, 2011 by pm286

I have been pointed to a paper in PLoSONE on Data sharing http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0021101

I haven’t read the text but I am afraid I have to comment adversely on the way the data are presented. Because it illustrates a fundamental reason why data cannot be shared. This is data from Table 6 in the paper:

This is described as “a Powerpoint-friendly Image”. It’s unreadable to a human (though if you step away it gets possible).

What does it represent? The paper describes a survey carried out by questionnaire (survey instrument) and the results are presented in 29 Tables. The entries in the tables are small amounts of text, and numbers. Here’s Table 6;

So this is a table of data. And it is transmitted as a TIFF. And if you want a “powerpoint friendly image” it appears to be a PNG.

In simple terms this completely destroys data.

Now I know some of the people involved – Carol, Cameron Neylon (who edits this, and has a journal on reproducible computing) and the folks at PLoS.

Something has gone terribly wrong here. Maybe in the authoring, maybe in the reviewing, maybe in production.

Open Access by itself solves a bit, but not enough. We also need to make our Open products BETTER than previously. Any of CSV, HTML or even XLS would be possible for the tables. Then the data could be shared.

We have to move towards fully interoperable semantic Open data.

Posted in Uncategorized | 17 Comments

What’s wrong with scholarly publishing? The MOTSI

Posted on July 16, 2011 by pm286

NOTE: You may find my allegory of “Monsters of the Id” as irrelevant to scholarly publishing. If so, skip this. But do not doubt that scholarly publishing needs changing – drastically and soon and that I, at least, am committed to finding ways for that to happen. Before they happen outwoith our control.

I have used the term “Monsters of the Scholarly Id” (MOTSI) to describe the dysfunctionalities in scholarly publishing created unconsciously by academia, driven by its innate need for self-glorification. This may seem OTT so I’ll ramble through the background and the idea.

I start with my perception, shared by many, that scholarly publishing is increasingly dysfunctional. Obviously not everyone will agree. A CEO of a publishing company which sees revenues increase over the decade by 9% or so is not going to complain. I’ve blogged before on Richard Poynder interviewing Springer’s CEO (http://www.infotoday.com/it/jan11/Interview-with-Derk-Haank.shtml ) Read it – it chills me that this is purely about revenue – not any sense of providing useful goods in response to a market demand. A senior editor of a “successful” closed access journal isn’t going to complain – s/he probably gets paid expenses at least and lots of brownie points. A researcher with lots of citations and H-index karma isn’t going to complain. The 1-in-a-hundred researcher who has got a paper into NatSciCell may be able to get a job on the strength of it.

But many, many feel severe dysfunction. I’ll come to the causes later – they may not be so different from performing arts, or authors of fiction – the system does not allow everyone to succeed. But science is different. If we simply strive for the “excellent” (whatever that is) we neglect the good on which science is built. We have to separate the good from the unacceptable.

At a SciFoo camp about 3 years ago we had a discussion about scientific publishing (this has been a common theme at SciFoo). Two young attendees felt that the situation was so bad they were going to write an article for NatSci, but this never got written. But it’s a common theme on the blogosphere.

So what’s the Id? I grew up in an era when – I think – Freudian theory was almost regarded as proven fact. I believed in the id, ego and superego and I’ll replay them here using Wikipedia (http://en.wikipedia.org/wiki/Id,_ego_and_super-ego ).

Id, ego and super-ego are the three parts of the psychic apparatus defined in Sigmund Freud’s
structural model of the psyche; they are the three theoretical constructs in terms of whose activity and interaction mental life is described. According to this model of the psyche, the id is the set of uncoordinated instinctual trends; the ego is the organised, realistic part; and the super-ego plays the critical and moralising role.^[1]

And

The id comprises the unorganised part of the personality structure that contains the basic drives. The id acts according to the “pleasure principle“, seeking to avoid pain or unpleasure aroused by increases in instinctual tension.^[2]

The id is unconscious by definition:

“It is the dark, inaccessible part of our personality, what little we know of it we have learned from our study of the dream-work and of the construction of neurotic symptoms, and most of that is of a negative character and can be described only as a contrast to the ego. We approach the id with analogies: we call it a chaos, a cauldron full of seething excitations… It is filled with energy reaching it from the instincts, but it has no organisation, produces no collective will, but only a striving to bring about the satisfaction of the instinctual needs subject to the observance of the pleasure principle.”^[3]

And this article specifically references my inspiration:

In the classic 1956 movie Forbidden Planet, the destructive forces at large on the planet Altair IV are finally revealed to be “monsters from the id” — destructive psychological urges unleashed upon the outside world through the operation of the Krells’ “mind-materialisation machine”. The example is of significance because of the unusual degree of insight it demonstrates: the creature eventually revealed follows classical psychoanalytic theory in being literally a dream-like primary process “condensation” of different animal parts. The plaster cast of its footprint, for example, reveals a feline pad combined with an avian claw. As a crew member observes, “Anywhere in the galaxy this is a nightmare”.

So my allegorical approach is to see the dysfunctions of scholarly publishing as arising from the subconscious of academia. The drive to achieve, the drive to be recognised and glorified. The need for gratification. And where uncontrolled, the id triumphs at the cost of rational behaviour.

Ultimately in Forbidden Planet the only solution is to destroy the planet, at a cost of destroying the good that the Krell have bequeathed. I’m not suggesting that we destroy scholarly publishing. But I think it possible that the monsters it has created will, if untackled, lead to catastrophic changes.

There is no super-ego of academia. Indeed it is not clear whether the uncoordinated behaviour of 10,000 institutions can have a super-ego – a controlling intelligence. For me it is tragic that Universities are not collectively addressing their role – in public – and getting feedback. Maybe they do this in closed national sessions with the great-and-the-good of government. Politicians have blogs and tweet. Stephen Fry tweets. Where is the vice-chancellor who reaches out to todays’ world? Where, indeed, are the senior academics? There are a few – a very few – and we may meet them at ScienceOnline in September in London. But academia does not care about the common wo/man. It looks inward, not outward. Where is the world leadership? And that is one of the causes of the problems.

So, while academia gazes inwards, the planet needs it more than ever. The sleepwalking has consequences outside scholarly publication. Where is the communal action to address climate change, resistance to disease, ageing, hunger and many other predictable problems. Why shouldn’t universities work together? But they are set up to compete, to generate their own feeling that they are better than their neighbours. And so the MOTSI are rife in scholarly publishing

What are the MOTSI? I probably haven’t thought of them all and I’m hoping for your input as well. The MOTSI are things that we have created unconsciously. They are not Frankenstein monsters or Mr Hydes which we have deliberately created and been unable to control. Because in those cases the creator is often aware of the dysfunction even if they cannot control it. The MOTSI have emerged during our sleep. Some are, in principle, controllable if we woke to the need to do so. They include, in no particular order:

The revenue-oriented publisher (which includes scholarly societies)
The “citation” and citation metrics (which I will set as homework)
Journal branding and the journal impact factor
New Journal SPAM
The PDF and the monoculture of publishing technology

I’d value your input on:

“What is a citation?”

This is not trivial. I do not know the answer. But if we are using this as a measure of a person’s worth (and hence their institution) we owe ourselves the responsibility of defining it.

Posted in Uncategorized | Leave a comment

What’s wrong with Scholarly Publishing? New Journal Spam and “Open Access”

Posted on July 16, 2011 by pm286

I got the following SPAM (unsolicited bulk mail) today. (There seems to be an assumption that SPAM for conferences, journals, etc, is OK. It’s not. It wastes my time and leads to errors. If I get (say) 5 invitations a day to “speak” at conferences whose acronyms I don’t know I miss those few which genuinely want me to attend. It’s irresponsible and unacademic. From today:

“Dear Prof. Murray-Rust:
Greetings! I hope you are well. On behalf of IGI Global, I would like to invite you to share your current research interests in the form of an editorship capacity. As you may know, IGI Global is an internationally-recognized publisher of high quality scholarly reference books, journals and teaching cases.

…

Introducing ??International Research Journal of Library, Information and Archival Studies?

The International Research Journal of Library, Information and Archival Studies is a multidisciplinary peer-reviewed journal that will be published monthly by International Research Journals (http://interesjournals.org/IRJLIAS). IRJLIAS is dedicated to increasing the depth of the subject across disciplines with the ultimate aim of expanding knowledge of the subject.

Call for Research Articles

…

We invite you to submit a paper /abstract /poster /workshop to the 4th Qualitative and Quantitative Methods in Libraries International Conference (QQML2012), 22 – 25 May 2012, Limerick, Ireland.

…

2011 the 2nd International Conference on system science, engineering design and manufacturing informatization (ICSEM 2011 )

On behalf of the Scientific and Organizing Committees it is our great pleasure to invite you, together with accompanying persons, to attend the 2011 the 2nd International Conference on system science, engineering design and manufacturing informatization (ICSEM 2011 ),

This is all simple SPAM and I have to filter it out by hand (all conferences look so similar no machine learning will work). But because I am blogging on scholarly publication I stopped to look at the following – and it’s an excellent illustration of New Journal SPAM. Firstly it is, of course, simple SPAM because I didn’t ask for it in my mail box. But it’s more instructive than that.

Dear Researcher,

Greetings from the Modern Economy(ME) ,which is published by Scientific Research Publishing ( SRP ), USA.The aim of the International Journal of Modern Economy is to provide a forum for scientists and social workers to present and discuss issues in international economics.

Normally this goes in the SPAM bin immediately but I thought I’d follow this up. Needless to say I haven’t heard of SRP. So I went to their home page (http://www.scirp.org/ ). 150 Open Access journals. Wow! This must be a GOOD THING…

Hang On… “Open Access” does not equal “good”. Open Access can be good or bad or in between. Open Access means only one thing – anyone can read it without payment. “Open” is now frequently being used in the same way as “Healthy” or “Green”. More a marketing term than a precise description. “Open” does not always mean Open Definition compliant (I’ll leave this as a surprise…) . And even if it does that is all it means. “Free to use, re-use and redistribute for any purpose and without restriction save for acknowledgement”. That does not mean good or bad, useful and useless. Be very clear on that because there are a large number of new Open Access journals and IMO some of them use Open as a marketing term.

So, anyway, SCIRP publishes chemistry. There are very few Open chemistry journals (the only non-specialist one is Beilstein Journal of Organic Chemistry – PloS doesn’t chemistry). So a new one is welcome – in principle. Let’s have a look at: http://www.scirp.org/journal/ijoc/ “International Journal of Organic Chemistry”

It’s got an ISSN – that simply requires payment. SCIRP is a member of CrossRef. I do take some assurance from that – I know the Crossref people and I assume they have some minimal barrier to entry. They have rules for membership (http://www.crossref.org/02publishers/59pub_rules.html ). These mainly relate to the management of metadata and DOIs (which is Crossref’s business). To continue, let’s look at the mission of the journal…

International Journal of Organic Chemistry (IJOC) is an international, specialized, English-language journal devoted to publication of original contributions concerning all field of organic chemistry.

It is an open-access, peer-reviewed journal describing the synthetic approached and the design of new molecules for applications in medicinal chemistry, but also for the synthesis of precursors for new materials (solid or liquid) that have an interest in materials sciences and nanotechnology, homogeneous and heterogeneous catalysis.

Contributions that concerns with analytical characterization, advanced techniques and the studies of properties for new synthesized molecules will also be highlighted. All manuscripts must be prepared in English, and are subject to a rigorous and fair peer-review process. Accepted papers will immediately appear online followed by printed hard copy. The journal publishes original papers including but not limited to the following fields:

Fluorescent Molecules and Dyes, Organo-metallics, Polymers, Surfactants Among Others, Synthesis of Reagents

The journal publishes the highest quality original full articles, communications, notes, reviews, special issues and books, covering both the experimental and theoretical aspects of organic chemistry.

Papers are acceptable provided they report important findings, novel insights or useful techniques within the scope of the journal. All manuscript must be prepared in English, and are subjected to a rigorous and fair peer-review process. Accepted papers will immediately appear online followed by prints in hard copy. It will be available through http://www.scirp.org/journal/ijoc.

(There are some apparent illiteracies, but I’ll pass …) So far so good. Let’s look at the editorial board. (http://www.scirp.org/journal/EditorialBoard.aspx?JournalID=527 ). Wow! 50 names (I’ve heard of one, but I don’t move much in synthetic organic circles, so don’t count that). Now the papers:

http://www.scirp.org/journal/PaperInformation.aspx?paperID=4764

“One-Pot Three-Component Synthesis of Imidazo[1,5-a]pyridines”

I understand what that means. I know what a Imidazo[1,5-a]pyridine is. Assuming this is factually correct, this is solid chemical science – potentially useful to other chemists who want to know how to make this type of compound. The bedrock of factual labotatory science.

I can text mine this! It’s Open, isn’t it. Let’s find their definition of Open Access… Can’t find it … What’s the c opyright? The paper carries:

This is NOT OK

This is NOT OKD compliant. It might be regarded as Green Open Access but it’s not Gold. And I can’t text-mine it.

Lesson: “Open” means almost nothing unless defined.

But, at least it’s readable by anyone. So I am intrigued. I haven’t heard of SCIRP so I’ll look in Wikipedia.

What, Wikipedia? All academics know that is unregulated junk. Well, the stuff that I (PM-R) wrote in Wikipedia is correct to the best of my ability. And I am coming to believe in the correctness of Wikipedia in sciences to a great level than many other conventional sources. Anyway maybe Wikipedia can tell us how old SCIRP is . From http://en.wikipedia.org/wiki/Scientific_Research_Publishing

Scientific Research Publishing is an academic publisher of open access
electronic journals. The company created a controversy when it was found that its journals duplicated papers which had already been published elsewhere, without notification of or permission from the original author. In addition, some of these journals had listed academics on their editorial boards without their permission or even knowledge, sometimes in fields very different from their own. A spokesperson for the company commented that these issues had been “information-technology mistakes”, which would be corrected.^[1]

Well it’s a stub entry. From a single source. But from the various edits it seems likely that the company started in 2008 and added several new journals each month. The original stub seems to have been catalysed by a Nature article:

Sanderson, Katharine. “Two new journals copy the old”. Nature 463, 148 (2010)

Let’s have a read of that. After all Nature brands itself as “The world’s best science and medicine on your desktop”. I assume that is agreed by the whole publishing community. (But then what does “best” mean? I can brand my science as the “best” approach to semantic chemistry. ) Let’s have a look:

Log in

Why do they want me to sign up? Probably because they want to add me to their “direct mailing list”. Well, sorry Nature, I’m not going to. I’ll go to Pubmed http://www.ncbi.nlm.nih.gov/pubmed/20075892 – not much there. Pubmed are scrupulously careful not to violate the “rights” of publishers. Which means we don’t get to read things. UKPMC (on whose advisory board I am) cannot give help either. So finally to the blogosphere (through Google) and I find a (quite by chance) recent post http://possibleexperience.blogspot.com/2011/07/adventures-in-fake-academic-publishing.html I’m copying iyt in full as it has useful links:

Adventures in fake academic publishing: SCIRP

Here’s a new ‘journal’ in philosophy published by the disreputable SCIRP. [PMR possibleexperience’s phrase, of course, not mine as I have an open mind] The business model of this outfit is to charge authors for ‘publication’ in their online ‘journals’ (rather than charging readers for access to the articles), charging charge $300 for the first ten pages, $50 for each page thereafter, as stated in ‘author’s guidelines’:

Your paper should not have been previously published or be currently under consideration for publication anywhere else. Papers should be submitted electronically through the Open Journal of Philosophy (OJPP) Submission System. All papers are subject to peer review. After a paper is accepted, its author must sign a copyright transfer agreement with OJPP. Papers accepted for publication will be made available free online. The modest open access publication costs are usually covered by the author’s institution or research funds ($300 for each paper within ten printed pages, and $50 for each additional page). Scientific Research Publishing may grant discounts on paper-processing fees for papers from lower income countries, or by students, or authors in financial difficulty. The amount of discount will depend on a variety of factors such as country of origin, quality of the work, originality of the article, and whether this particular article was submitted at the invitation of the editor-in-chief. Since only about 20% of papers published in each issue will receive the discounts, there is no guarantee that a discount will be granted to every author who meets the requirements.

SCIRP created a stir last year when at least two of its journals were caught republishing papers without permission.

Two new journals copy the old At least two journals recently launched by the same publisher have duplicated papers online that had been published elsewhere. Late last year, an organization called Scientific Research Publishing reproduced the papers in what its website (www.scirp.org) billed as the first issues of the new journals Journal of Modern Physics and Psychology. Huai-Bei Zhou, a physicist from Wuhan University in China who says he helps to run Scientific Research’s journals in a volunteer capacity, says that the reproductions were a mistake…

What is the quality of their publications? Here’s one from Advances in Bioscience and Biotechnology, another of their journals:

“Molecular genetic program (genome) contrasted against non-molecular invisible biosoftware in the light of the Quran and the Bible,” Pallacken Abdul Wahid, Advances in Bioscience and Biotechnology, vol. 1, no. 4, 2010, pp. 338-47.

“[The] most striking one is that a living cell and its dead counterpart are materially identical, i.e., in both of them all the structures including genome are intact. But yet the dead cell does not show any sign of bioactivity. This clearly shows that the genome does not constitute the biological program of an organism (a biocomputer or a biorobot) and is hence not the cause of “life”. The molecular gene and genome concepts are therefore wrong and scientifically untenable. On the other hand, the Scriptural revelation of the non-molecular biosoftware (the soul) explains the phenomenon of life in its entirety.”

PMR. Well at least a reputable journal from a reputable publisher would never publish an article that mixed science with religion in this way, would they? You would never get an article about proteomics and creationism from a reputable journal, would you?

Unless you know different…

Posted in Uncategorized | Leave a comment

What’s wrong with Scholarly Publishing? Your feedback

Posted on July 16, 2011 by pm286

I asked a simple question:

“What is the single raison d’etre of the Journal Impact Factor in 2011?”

And have had two useful answers:

Zen Faulkes says:

July 15, 2011 at 12:19 pm

For me, it’s to ensure that the journal I submit to is a real scholarly publication. There are a lot of new online journals opening up. Some of them are not credible. For me, that a journal has an Impact Factor lets me know that sending a manuscript there is not just the equivalent of burying the paper in my backyard.

and

Laura Smart says:

July 15, 2011 at 4:29 pm

Ultimately it boils down to evaluating academics. As Zen Faulkes says, academics do use it as a measure of quality for journals where they may choose to publish, however flawed it may be. It’s an easy shorthand. Everybody within the current academic publishing system uses it in this fashion whether it be grant reviewers, hiring committees, tenure committees, peer reviewers, or faculty considering where to volunteer their effort as editors/editorial board members. Grant providing bodies use it when evaluating the publications produced from awards. Publishers may use it slightly differently: as a marketing tool for selling value. But who are they marketing to? The academics who are using the journal impact factor to evaluate one each worthiness.

It’s been said for 15 years (or more) that the responsibility for changing the scholarly publishing system rests with changing the organizational behavior of the institutions producing the scholarship. People have to stop using journal impact factor as a judgment tool. This won’t happen until there is incentive to change. The serials pricing crisis and usage rights issues haven’t yet proved to be incentive enough, despite lots of outreach by librarians and the adoption of Open Access mandates by many institutions.

Scholars won’t change their behavior until the current system affects their ability to get funding, get tenure, and advance their careers.

These are valuable comments and I’ll use them to introduce why I think we have created Monsters of the Scholarly Id. The JIF is probably the worst as it is not only flawed but its use shows that academia does not really care about measuring quality. The JIF was not created by academia, it was created by publishers as a branding instrument. And that is precisely what it is – a branding tool, created by the manufacturer. It was neither designed nor requested by academia but, as Laura says, it has been adopted by them. They do not control it and so they are in its grip (more later).

Brands can be valuable. http://en.wikipedia.org/wiki/Brand gives The American Marketing Association defines a brand as a “name, term, design, symbol, or any other feature that identifies one seller’s good or service as distinct from those of other sellers “. The branding of household products in the 19^th century by pioneers such as William Hesketh Lever (http://en.wikipedia.org/wiki/Lever_Brothers ) where “The resulting soap was a good, free-lathering soap, at first named Honey Soap then later named “Sunlight Soap“. Until that time soap had been of highly variable quality and the branding by Lever allowed customers to associate a brand with consistent and high quality. Many other businesses followed suit.

It is fairly easy to determine whether soap is of good quality or substandard. Whether a car is reliable or breaks down. For many other products (beer, clothes, fragrances, …) the association depends on subjective judgments including a large amount of personal preference. Which brings us to the branding of journals.

I am going to argue later that we do not need journals and that they are increasingly counterproductive. However, assuming that we do need them, is branding useful? Branding is now common – the journal carries the publisher’s name and may have a consistent visual look-and-feel. But visual consistency does not mean valuable or even consistent science.

Journals are – unless you tell me otherwise – unregulated. And that’s how it should be. Anyone can set up a journal. http://en.wikipedia.org/wiki/Liebig founded a journal “The volumes from his lifetime are often referenced just as Liebigs Annalen; and following his death the title was officially changed to Justus Liebigs Annalen der Chemie.” Many blogs have the effective status of journals and many contain high-quality scientific content. (Certainly I would encourage anyone who had something to communicate about scholarly publishing to blog it rather than using a scholarly journal such as Serials Review). So Zen quite rightly asks (implicitly) about journal regulation.

I think he is right to ask for it, though the JIF is not a regulation – it’s a branding sought by the publisher for the benefit of the publisher. Nothing specifically wrong with that, but to assume it acts in the interests of academia is to misunderstand branding. It’s primary purpose is to give a single, apparently objective and regulated, number giving an apparent indication of quality. I use the word “apparent” as that is what academia consumes, but since the process of JIF creation is not transparent it is not objective. (That is separate from whether it measures anything useful – which I believe it does not).

Because the number of journals has risen so rapidly it is impossible, even within a field, to determine the standing of any particular one. (Why it has risen I’ll try to deal with later, but it’s not because readers are asking for more journals). So presumably we can rely on the reputation of a publisher justifying the quality of a journal.

Unfortunately not. (This news is 2 years old…) See http://classic.the-scientist.com/blog/display/55679/ where The Scientist in 2009 revealed that

Elsevier published 6 fake journals

Scientific publishing giant Elsevier put out a total of six publications between 2000 and 2005 that were sponsored by unnamed pharmaceutical companies and looked like peer reviewed medical journals, but did not disclose sponsorship, the company has admitted.

This is not denied by Elsevier who stated:

“We are currently conducting an internal review but believe this was an isolated practice from a past period in time,” Hansen continued in the Elsevier statement. “It does not reflect the way we operate today. The individuals involved in the project have long since left the company. I have affirmed our business practices as they relate to what defines a journal and the proper use of disclosure language with our employees to ensure this does not happen again.”

It is gratifying that Elsevier have indicated that there was – in 2011 language – a “single rotten apple” and that the problem has been cleaned up and we can relax for the future. And I am sure they are grateful to the Scientist for discovering the problem which lay undetected for several years. Nonetheless it shows the commercial pressure to publish journals. Unlike the journals I grew up with, which were the outputs of learned societies and were to promote science, the primary purpose of most (not all) of today’s journals is to make money. (In the MGS we subsidised the journal from the membership – tempora mutantur). I talked about 4 years ago to someone whose business was creating new journals. His recipe:

Find an area (his was medical) where he could create a niche demand. The demand didn’t have to exist, it just had to be creatable.
Create a journal, with luminary editorial board. Find the senior editor. Academics like to be on boards. It makes them look good on their CV. Sometimes they even get jollies. (Disclaimer: I have had one free jolly – a (working) breakfast from the J. Cheminformatics: Coffee, fruit, donuts – probably 10USD).
Get a reasonable number to submit papers for the first issue. They won’t be critically reviewed will they? After all it’s the editorial board. And we need it to look good. Doesn’t really matter if you take and old paper, rework it a bit as a review with some new work. And get the grad student to do the hard work of the references and some pretty pictures.
Get academic libraries to subscribe (this was closed access, reader pays). Most very large universities would do this.
Wait two years and sell the journal to a major publisher for ca 100K GBP

Everyone benefits.

Except academia, who has subscribed to yet another albatross. But there’s lots of money in the system. And anyway the researchers don’t pay, the library does. And we need the freedom to publish, don’t we?

Checklist of monsters (MOTSI) so far:

The branded journal
The new journal
The journal impact factor

(there’s more to come). But since this is already a long post, let’s have a separate post on the worst of the new journal…

So, Zen, we do need an independent reviewer of scholarly publishing. A consumer magazine “Which Journal?” But the impact factor, which is negotiated by publishers with non-answerable commercial companies in a closed process does not provide it. http://en.wikipedia.org/wiki/Quis_custodiet_ipsos_custodes%3F . It should be academia, but it doesn’t seem to be. After all there are more urgent things to do than monitor our own quality. We’ll do the research and let the commercial sector tell us how. (And in “commercial” I include the major non-profit societies which have become unbalanced and use publishing to fund their activities rather than the other way around).

So your next assignment (after all we rely on citations so much)

“What is a citation?”

Answers within 24 hours welcome

Posted in Uncategorized | 2 Comments

What’s wrong with Scholarly publishing? Measuring quality

Posted on July 15, 2011 by pm286

I’m starting all these posts with “What’s wrong with Scholarly publishing?”. That’s because I am getting feedback, which includes young researchers who are following them, and libraries/academics who wish to use them as resources material. I’ll note that I do not put enough effort into creating hyperlinks – it takes a surprising amount of effort and I’d like to see better tools (e.g. Google or Wikpedia searches for concepts).

Blogs have a mind of their own – I didn’t know a week ago I would be writing this post – and this topic has grown larger than I anticipated. That’s partly because I think it takes us slightly closer to the tipping point – when we see a radical new approach to scholarly publishing. I’m not expecting that anything is directly attributable to this blog. But it all adds up and acts as an exponential multiplier for change.

I’ll be writing later on the dysfunctionalities of the publishing system – “Monsters of the Scholarly Id” – that we academics have unwittingly (but not blamelessly) created. These MOTSI are set to destroy us and I’ll look at each in detail and also ask for y/our input. If you want to anticipate, try today’s homework:

“What is the single raison d’etre of the Journal Impact Factor in 2011?”

Feel free to comment on this blog – I’ll give my analysis later – perhaps in 2 days.

Meanwhile, since we shall come later and in depth to measurement of quality in SchPub, let’s see how we measure quality and utility objectively.

For those of you who don’t know him, read Ben Goldacre’s column (http://www.badscience.net/ ). It’s more than an attack on bad science – it’s also a simple and compelling account of how to measure things accurately and well. How to sho whether product X has an effect. Whether A is better than B (and what better means). Whether government policies work.

From a Kryptonite ad: “70% of readers of WonderWomanWeekly say that Kryptonite gave their hair more life and volume”. You’ll all recognize this as marketing crap. Almost everything is untestable. How was the survey carried out (if indeed it was)? Did Kryptonite sponsor the survey? What does “volume” mean? (It’s not determined by sticking your head in a measuring cylinder?) It’s a subjective “market performance indicator”. What does “life” mean (for a tissue that is dead).

This couldn’t happen in scholarship, because it is run by respectable academics who care about the accuracy of statements and how data is measured. To which we return later.

Is X a better scientist than Y? Is Dawn French more beautiful than Jennifer Anniston? Is she a better actress?

There are two main ways to answer these questions objectively

Ask human beings in a controlled trial. This means double-blinding i.e. giving the assessors material whose context has been removed (not easy for actresses) so that the assessors do not know what they are looking at and making sure those who manage the trial are ignorant of the details which could sway their judgment. The choice of questions and the management of the trial are difficult and cost money and effort
Creating a metric which is open, agreed by all parties, and which can be reproduced by anyone. Thus we might measure well-being by the GDP per head and the average life-expectancy. These quantities are well defined and can be found in the CIA factbook and elsewhere. (The association of well-being with these measures is, of course, subjective, and many would challenge it.) . Dawn French and Jennifer Anniston can be differentiated by their moments of inertia.

Metrics cause many problems and trials cause many problems. This is because the whole task is extremely difficult and there is no simpler way of doing it.

Is scientist X better than scientist Y? Ultimately this is the sum of human judgments – and it should never be otherwise. What are the ten best films of all time? This type of analysis is gentle fun, and IMDB carries it out by collecting votes – http://www.imdb.com/chart/top and the The Shawshank Redemption tops the list (9.2/10). Everyone will argue the list endlessly – are modern films more represented? Are the judgements made by film enthusiasts? Should they be? And so on.

Here’s a top ten scientist:

http://listverse.com/2009/02/24/top-10-most-influential-scientists/

and another

http://mooni.fccj.org/~ethall/top10/top10.htm

and another

http://www.biographyonline.net/scientists/top-10-scientists.html

None agree completely … and if you felt like it you could do a meta-analysis – analysing all the lists and looking for consistent choices. A meta-analysis migh well discard some studies as not sufficiently controlled. I’d be surprised to see a meta-analysis that didn’t have Newton in it, for example. Note that the meta analysis is not analysing scientists, it’s analysing analyzers of scientists – it makes no independent judgment.

Let’s assume that a vice-chancellor or dean wishes to decide whether X or Y or Z should be appointed. Or whether A should be given tenure. These are NOT objective choices. They depend on what the goals and rules of the organization are. One criterion might be “how much money can we expect X or Y or Z to bring in grants”. We might try to answer this by asking “how much money has XYZ brought in so far?”. And use this as a predictor.

If grant income is the sole indicator of value of a human to an institution then the institutional is likely to be seriously flawed as it will let money override judgments. Ben Goldacre gives examples of universities which have made highly dubious appointments on the basis of fame and money. But someone who brings in no grant income may be a liability. They would (IMO) have to show that their other scholarly qualities were exceptional. That’s possible, but it’s hard to judge.

And that’s the rub. Assessing people is hard and subjective. I think most scholars try hard to be objective. I’ve sat on review panels for appointments, reviews of institutions and departments. A competent review will provide the reviewer with a large volume of objective material – papers, grants, collaborations, teaching, engagement, etc. And the reviewer may well say things like:

“This is a good publication record but the last five years appear to have been handle-turning rather than breaking new ground. They will continue to bring in grants for a few years”

“If the department wishes to go into [Quantum Animatronics] then this candidate shows they have the potential to create a world-class centre. But only if you agree to support a new laboratory.”

“This candidate has a valuable new technique which can be applied to a number of fields. If the department wishes to become multidisciplinary then you should appoint them”

And so forth. None of these are context-free.

I understand that there are some US institutions that appoint chemists solely on the number of publications they have in J.Am.Chem.Soc. (Even Nature and Science don’t count). This has the advantage that it is delightfully simple and easy to administer. Given a set of candidates even a 5-year old could do the appointing. And it saves so much money. My comment is unspoken.

Posted in Uncategorized | 5 Comments

What’s wrong with scholarly publishing? Those who are disadvantaged speak

Posted on July 14, 2011 by pm286

I publish in full an unsolicited comment, which expresses exactly why closed access publishing has become unacceptable.

Bill Roberts says:

July 14, 2011 at 8:00 am

As a non-academic but occasional reader of published academic papers, the current system of publishing actively deters me from reading the best work of scientists. If I was a researcher working in an institution and needing to read papers every week, then I suppose the journal subscription systems are workable. But for me, every time I hit a $30/article paywall, I simply go back to google and look for blog posts or preprints (or the one or two open journals) instead.

As a researcher, then clearly there is advantage in professional status and an advantage for the institution in getting papers into prestigious journals, but this is at the cost of *actively preventing* a proportion of the potential audience for the paper from ever seeing it.

I’m sure I could get some kind of ‘affiliate membership’ of a university library and so get access that way, but the marginal benefit each time isn’t big enough to make me do that.

The web ought to be the ideal medium for coping with ‘long tail’ people like me, but as you have so clearly pointed out on several occasions, the current system of academic publishing has conspicuously failed to take advantage of the possibilities offered by the web.

Bill (who I don’t know) expresses precisely the inequity of the current system. His taxes (wherever) go to support research, go to support university libraries, but he cannot have access to the results. I am not arguing that the system should be cost-free, but that all parties should be rapidly working towards a sustainable business model. One that allows Bill to have access to the literature.

If you are an academic reading the literature, next time you celebrate another paper in NatScICell think of Bill. Think of the people suffering from the disease that you might, in years, have some comfort for. Think of the patient groups who have collected on the streets, given legacies, to fund your research. Who cannot even read what they have worked to supported. It is the arrogance of academics which is fuelling this system.

And publishers (and I have been sparing of criticism so far), think whether charging $40 to read a 1 page article for 1 day (Serials review) is advancing the cause of science. Think what the effect actually is. You are alienating Bill. The service of communication is replaced by the tyranny of gatekeeping. Bill doesn’t pay your prices and I suspect very few do. You are simply advertising that you don’t care about Bill. You can buy popular science magazines weekly for $5 – I’m not an economist but that doesn’t upset me. But $40 for 1 page for 1 day is inexcusable. And, as I shall comment on later, charging by the article ignores that fact that many readers now never read articles all the way through.

If both parties (academics and publishers) keep on in their narrow world where only the privileged exist the scholarly world will fracture. That may be sooner than we think. Murdoch had zero public support here, and has crashed. If you need public support, you won’t find it from Bill.

Posted in Uncategorized | 2 Comments

What’s wrong with scholarly publishing? How it used to be

Posted on July 13, 2011 by pm286

While waiting for feedback (and there’s a good discussion on Friendfeed) here’s a (probably rosy-tinted) ramble through history…

I started my research almost 50 years ago and did my doctorate in 2 years (required for chemistry as we had a fourth year of research in my first degree). During that I did several crystal structures (it was becoming slightly mechanised but I had to measure up to 50,000 diffraction intensities by eyeballing photographic films *and typing them up*. I created a thesis (which did not impress my (very famous) examiners and I am told I passed on the strength of my viva). That thesis is in the Bodleian and – thanks to Sally Rumsey should be being digitized RSN. You will then be able to judge its quality (it is really not too bad – and would now simply require corrections.

The process of publication was technically much harder. It took 2 days to create a picture of a crystal structure. It needed the coordinates creating with sine tables and worse. Then drawing with a Rapidograph. When I got it wrong I had to scrape the errors off with a razor. All data had to be punched up and included. I went straight into an assistant lectureship at the then new University of Stirling. Straightforward benevolent nepotism – I was invited to apply by my college tutor, Ronnie Bell, who took up the chair of chemistry. I was in the right place at the right time.

Anyway during my DPhil I published my first paper – a rather fun structure – in Chemical Communications. This was a new (and I felt exciting) journal where you wrote a brief account of the work and (at that stage, I think) were expected to write it up fully later (e.g. with the data). There was a feeling of competition – only interesting chemistry was accepted. Not sure I got much feedback. After getting to Stirling and recovering from the DPhil viva I wrote up the other structures and sent them off to J.Chem.Soc (the Chem Soc – now the RSC – had only one main journal (I think there was also Faraday Discussions) – but it was really a single national journal.

There was a clear feeling that you published in your national journal – UK=> JCS, Scandinavia=>Acta Chem Scand, US=>JACS, CH=>Helvetica Chimica, etc. If you had particularly specialist crystallographic material it was Acta Crystallographica (or possibly Zeitschrift fuer Kristallographie). In 1970 the world was very simple.

And then it changed. I remember in about 1972 getting a transfer of copyright form (I think from Acta Cryst, but it might have been JCS). I had no idea what it was. It was explained that this was to protect the authors from having their papers ripped off – that unless we gave copyright to the publisher they couldn’t act on our behalf.

At that stage we trusted publishers implicitly. Because they weren’t publishers. They were learned societies that we belonged to – paid membership to. That represented our interests because they were composed of us. Why would you not trust them?

Turning over our copyright was the biggest mistake that academia has made in the last 50 years. Because we handed over our soul. We didn’t even sell our soul – we gave it away. Was there an ulterior motive then? I’d like someone to tell me – I honestly don’t know whether it was a genuine idea or whether it was a con.

If we had the internet we would never have been ignorant of the issues. OKF or ORG etc. would have made it immediately clear that this was not necessary and could lead to disaster (as it has). But there was little communication – where do you look? The THES? No email, no blogs, no twitter…

And then – in about 1974 IIRC seeing Tetrahedron (or maybe Tetrahedron Letters) – a Pergamon Press journal. Pergamon was run by Robert Maxwell. It had an appealing visual quality – higher than the society journals. And it concentrated on one subject only – organic chemistry (whereas JCS and the other society journals had all subdisciplines of chemistry). It was irrelevant to me that it was commercial – I didn’t pay the bills and anyway universities had lots of money and could buy almost any journals they wanted. I’ve published in Tetrahedron and TetLetts. Why not?

And I remember going to Switzerland and when I go interesting and important results finding that the convention was to publish them in J. Am. Chem. Soc., not Helvetica. The first time that the choice of journal mattered. But that was because more people would read JACS than Helvetica. I didn’t feel any sense of choosing JACS because it was “better”, just that it would be better for my work.

The 1970s and 1980s had a strange step forwards and backwards – camera-ready copy. Not in most journals but in many monograph chapters. It was a quick, and I think honest, approach. We could say and draw roughly what we wanted as long as it kept within a square blue rectangle. You were responsible for your own diagrams and spelling. It wasn’t pretty but it was rapid (relatively).

And I was involved in setting up a new society (Molecular Graphics Society, 1981) which had its own journal. It was free to members. The society subsidized the journal. Throught membership. And yes, we made money out of meetings by charging fees for exhibitors. I was treasurer. We were financially viable.

And then the web came – 1993. I thought it would transform publishing. It was an opportunity for the universities to show what their publishing houses could do. It was an unparalleled opportunity for a new type of scholarship. I ran the first multimedia course on the Internet (Principles of Protein Structure). They were heady days. A few people believed – for me Birkbeck and Nottingham. But generally academia was totally disinterested in the new opportunities. Why? Please tell me.

They could and should have taken charge of scholarly publishing. Instead they let (and encouraged) commercial publisher to dictate to them what publishing was and was to become?

Who asked for PDF? Not me and no-one I have talked to.
Who asked for double-column PDF? Not me and no-one I have talked to.
Who asked for the “paper” to remain fossilized as a paper image?
Who asked for the printing bill to be transferred from the publisher to the department laserjet?
Who asked for manuscripts to be submitted through inhuman forms and grotesque procedures?

No-one. Academia has supinely accepted anything that the publishers have offered them. And paid whatever they have asked. (Yes, you may occasionally think you have saved money, but look at the publishers’ revenue – a monotonically increasing function (maths speak for something that increases year by year inexorably). In most industries innovation and scale have cut prices. …

…stop. I was meant to reminisce, not rant. I’ll fondly remember up to about 1990. Then it all goes wrong.

Posted in Uncategorized | 11 Comments

petermr's blog

MOTSI: What is a citation?

Journal review system: a reviewer’s perspective

The destruction of semantic data: The PLoS community replies

How to share data and how not to

What’s wrong with scholarly publishing? The MOTSI

What’s wrong with Scholarly Publishing? New Journal Spam and “Open Access”

What’s wrong with Scholarly Publishing? Your feedback

What’s wrong with Scholarly publishing? Measuring quality

What’s wrong with scholarly publishing? Those who are disadvantaged speak

What’s wrong with scholarly publishing? How it used to be

Recent Posts

Recent Comments

Archives

Categories

Meta