What is wrong with Scientific Publishing: an illustrative “true” story

Yesterday I abandoned my coding to write about scientific publishing:

http://blogs.ch.cam.ac.uk/pmr/2011/07/09/what-is-wrong-with-scientific-publishing-and-can-we-put-it-right-before-it-is-too-late/

and I now have to continue in a hopefully logical, somewhat exploratory vein. I don't have all the answers – I don't even have all the questions – and writing these posts is taking me to new areas where I shall put forward half-formed ideas and await feedback ("peer-review") from the community. The act of trying to express my ideas formally, for a critical audience, is helping to refine them. And I am hoping that where I am struggling for facts or prior scholarship that you will help. That's not an excuse for laziness , it's a realization that one person cannot address this problem by themselves.

This blog post *is* a scholarly publication. It addresses all the points that I feel are important – priority (not that this is critical), peer review, communication, re-use (if you want to), and archival (not perhaps formal, but this blog is sufficiently prominent that it gets cached. This may horrify librarians, but it's good enough for me).

The only thing it doesn't have is an ISI impact factor, and I'll return to that. It does have measures of impact (Technorati, Feedburner, etc.) which measure readership and crawlership. (These are inaccurate – they recently dropped by a factor of 5 when the blog was renamed – I'd be interested to hear from anyone who cannot receive this blog for technical reasons (timeout, etc.)). Feedburner suggests that a few hundred people "read" this blog. There's also Friendfeed (http://friendfeed.com/petermr ) where people (mainly well-aligned) comment and "like" posts; and Twitter where I have 650 followers (Glyn Moody has 10 times that) – a tweet linking to yesterday's post has just appeared.

So the blog post fulfils the role of communication – two way communication - and has mechanisms for detecting and measuring this. As I write this I imagine the community for whom I am preparing these ideas and from whom I am hoping for feedback. Ambitiously I am hoping that this could become a communal activity – where there are several authors. (We do this all the time in the OKF – Etherpads, Wikis, etc.) And who knows, this document might end up as part of a Panton Paper. As you can tell I am somewhat enjoying this, though writing is often painful in itself.

I am going to describe new ideas (at least for me) about scholarly publishing. I am going to use "scholarly" as inclusive of "STM" and extending to other fields – because in many cases the translation is direct; where there are differences I will explicitly use STM. I like the word "scholarly" because it highlights the importance of the author (which is one of the current symptoms of the malaise – the commoditization of authorship). It also maps onto our ideas of ScholarlyHTML as one of the examples of how publication should be done.

Before my analysis I'll give an example of the symptoms of the dystopia. This has reinforced me in my determination never to publish my ideas in a traditional "paper" for a conventional journal. Details are slightly hazy. I was invited – I think in 2007 – to write an article as part of an Issue on the progress of Open Access. Here it is

http://www.sciencedirect.com/science/article/pii/S009879130800004X

Serials Review
Volume 34, Issue 1, March 2008, Pages 52-64

Open Data in Science

Peter Murray-Rusta,

aMurray-Rust is Reader in Molecular Informatics, Unilever Centre for Molecular Sciences Informatics, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK

 

It will cost you 40 USD to rent it for ONE DAY. You are allowed to print it for personal use during this period.

*I* cannot read my own article and I do not have a copy.

The whole submission process was Gormenghastian and I have ended up being embittered by it. I asked for the article to be Open Access (Green) and believed that it would be available indefinitely so that I would not have to take a "PDF copy" (which is why I don't have one). When I discovered that I could not read my own article I contacted the publishers and was told that I had agreed to it being Open for a year after which it would be closed. Maybe – I don't remember this but there were 100+ emails and it may have slipped my unconscious mind. If I had been conscious of it, I would never have acquiesced. It's a bizarre condition – let people read something and then cut them off for ever. It has no place in "scholarly communication" – more in the burning of the libraries.

I took the invitation as an exciting opportunity to develop new ideas and to get feedback, so I wrote to the author (whom I only know throw email) and explained my ideas. (If I appear critical of her anywhere it is because I am critical of the whole system). I explained that "Open data" was an exciting topic where text was inadequate and to show this I would create an interactive paper (a datument) with intelligent objects. It would give readers an opportunity to see the potential and challenges of data. This was agreed and I would deliver my manuscript as HTML. I also started the conversation on Openness of the resulting article. The only positive thing was that I established that I could post my pre-submission manuscript independently of Elsevier. (I cannot do this with publishers such as the American Chemical Society – they would immediate refuse the article). I decided to deposit it in "Nature Precedings" – an imaginative gratis service from NPG. http://precedings.nature.com/documents/1526/version/1 . This manuscript still exists and you can copy it under CC-BY and do whatever you want with it. (No, there is no interactive HTML for reasons we'll come on to).

I put a LOT of work into the manuscript. The images that you see are mainly interactive (applets, SVG, etc.). Making sure they all work is hard. And, I'll admit, I was late on the deadline. But I finally got it all together and mailed it off.

Disallowed. It wasn't *.doc. Of course it wasn't *.DOC, it was interactive HTML. The Elsevier publication process refused to allow anything except DOC. In a rush, therefore,

I destroyed my work so it could be "published"

I deleted all the applets, SVG, etc. and put an emasculated version into the system and turned to my "day" job – chemical informatics – where I am at least partially in control of my own output.

I have never heard anything more. I got no reviews (I think the editor accepted it asis). I have no idea whether I got proofs. The paper was published along with 7 others some months later. I have never read the other papers, and it would now cost me 320 USD to read them (including mine). There is an editorial (1-2 pages which also costs 40 USD). I have never read it, so I have no idea whether the editor had any comments.

Why have I never read any of these papers? Because this is a non-communication process. If I have to wait months for something to happen I forget. *I* am not going to click on Serials Review masthead every day watching to see whether my paper has got "printed". So the process guarantees a large degree of obscurity.

Have I had any informal feedback? Someone reading the article and mailing me?

No.

Has anyone read the article? (I include the editor). I have no idea. There are no figures for readership.

Has anyone cited the article?

YES – four people have cited the article! And I don't have to pay to see the citation metadata :

http://www.scopus.com/results/citedbyresults.url?sort=plf-f&cite=2-s2.0-43149086423&src=s&imp=t&sid=Mt3luOQ49JTT7H5OHiBim3F%3a140&sot=cite&sdt=a&sl=0&origin=inward&txGid=Mt3luOQ49JTT7H5OHiBim3F%3a13

The dereferenced metadata (I am probably breaking copyright) is

1 Moving beyond sharing vs. withholding to understand how scientists share data through large-scale, open access databases 
Akmon, D. 2011 ACM International Conference Proceeding Series , pp. 634-635 0

2 Advances in structure elucidation of small molecules using mass spectrometry 
Kind, T., Fiehn, O. 2010 Bioanalytical Reviews 2 (1), pp. 23-60 2

3 An introduction to data mining 
Apostolakis, J. 2010 Structure and Bonding 134 0

4 Data mining in organic crystallography 
Hofmann, D.W.M. 2010 Structure and Bonding 134 0

I cannot read 3 of these (well it would cost ca 70 USD just to see what the authors said), but #2 is Open. Thank you Thomas (I imagine you had to pay to allow me to read it) [Thomas and I know each other well in cyberspace]. It is clear that you have read my article – or enough for your purposes. Thomas writes

that once data and exchange standards are established, no

human interaction is needed anymore to collect spectral

data [525]. The CrystalEye project (http://wwmm.ch.cam.

ac.uk/crystaleye/) shows that the aggregation of crystal

structures can be totally robotized using modern web

technologies. The only requirement is that the spectral data

must be available under open-data licenses (http://www.

opendefinition.org/) [544].

The other three may have read it (two are crystallography publications) or they may simply have copied the reference. It's interesting (not unusual) to see that the citations are 2 years post publication).

So in summary, the conventional publication system consists of:

  • Author expends a great deal of effort to create manuscript
  • Publisher "publishes it through an inhuman mechanistic process; no useful feedback is given
  • Publisher ensures that no-one can read the work unless…
  • University libraries pay a large sum (probably thousands of dollars/annum each) to allow "free" access to an extremely small number of people (those in rich universities perhaps 0.0001% of the literate world – how many of you can read these articles sitting where you are?)
  • No one actually reads it

     

In any terms this is dysfunctional – a hacked off author, who has probably upset an academic editor, and who have jointly ensured that the work is read by almost no-one. Can anyone give me a reason why "Serials Review" should not be closed down and something better put in its place? And this goes for zillions of other journals.

Hang on, I've forgotten the holy impact factor… (http://www.elsevier.com/wps/find/journaldescription.cws_home/620213/description )

Impact Factor: 0.707

Yup, roughly the square root of a half.

What will my colleagues say?

My academic colleagues (will unfortunately) say that I should not publish in journals with an IF of less than (??) 5.0 (J Cheminfo is about 3). That in itself is an appalling indictment – they should be saying "Open data is an important scholarly topic – you make some good points about A,B,C and I have built on them; You get X, Y wrong and you have completely failed to pay attention to Z."

My Open Knowledge and Blue Obelisk colleagues will say – "this is a great start to understanding and defining Open Data".

And I can point to feedback from the gratis Nature Precedings: (http://precedings.nature.com/documents/1526/version/1 )

This has:

  • 11 votes (whatever that means, but it probably means at least 11 people have glanced at the paper)
  • A useful and insightful comment
  • And cited by 13 (Google) (Scopus was only 4). These are not-self citations.

So from N=1 I conclude:

  • Closed access kills scholarly communication
  • Conventional publication is dysfunctional

If I had relied on journals like Serials Review to develop the ideas of Open Data we would have got nowhere.

In fact the discussion , the creativity, the formalism has come through creating a Wikipedia page on "Open data" and inviting comment. Google "Open Data" and you'll find http://en.wikipedia.org/wiki/Open_data at the top. Google "Open data in science" (http://www.google.co.uk/search?q=open+data+in+science&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a ) and the gratis manuscript comes top (The Elsevier article is nowhere to be seen).

As a result of all this open activity I and other have helped to create the Panton Principles (http://pantonprinciples.org/ ). As you will have guessed by now I get no academic credit for this – and my colleagues will regard this as a waste of time for a chemist to be involved in. For me it's true scholarship , for them it has zero "impact".

In closing I should make it clear that Open Access in its formal sense is only a small advance. More people can read "it", but "it" is an outdated, twentieth century object. It's outlived its time. The value of Wikipedia and Nature Precedings for me is that this has enabled a communal journey. It's an n<->n communication process rooted in the current century.

Unless "journals" change their nature (I shall explore this and I think the most valuable thing is for them to disappear completely) then the tectonic plates in scholarly publishing will create an earthquake.

So this *is* a scholarly publication – it hasn't ended up where I intended to go, but it's a useful first draft. Not quite sure what – perhaps a Panton paper? And if my academic colleagues think it's a waste, that is their problem and, unfortunately, our problem.

[And yes – I know publishers read this blog. The Marketing director of the RSC rang me up on Friday as a result of my earlier post. So please comment.]

[OH, AND IN CASE THERE IS ANYONE ON THE PLANET WHO DOESN'T KNOW – I DON'T GET PAID A CENT FOR THIS ARTICLE. I DON'T GET REIMBURSEMENT FOR MATERIALS. I DON'T KNOW WHETHER THE EDITOR GETS PAID. THE JOURNAL TAKES ALL THE MONEY FOR "PUBLISHIING MY WORK"].

 

 

0 thoughts on “What is wrong with Scientific Publishing: an illustrative “true” story

  1. Luis Ibanez

    It is such a pleasure to read your blog post.

    Your conclusions are quite straight forward,
    and I fully agree with you:

    * "Closed access kills scholarly communication"
    * "Conventional publication is dysfunctional"

    Many years ago I heard a cultural sentence (that is probably an urban legend, but still carry some wisdom), it was about Japanese culture and the nature of service. It said something along the lines of:

    "Japanese people do not complain about bad service,
    They just don't use it."

    Hopefully a Japanese reader will clarify the truthiness of that sentence at some point, but in the meantime we can apply that drop of wisdom to the current scientific publishing challenge.

    I think that we are at the tipping point were what is needed is a simple straightforward boycott of closed access decadent Journals and Conferences. I signed up for such action on 2006 when IEEE was lobbying against the NIH Public Access Policy. Out of disappointment, I cancelled my IEEE membership, and stopped submitting papers to IEEE, reviewing for them, or citing papers from any of their Journals and Conferences. Since then, I have been encouraging others to do the same, particularly young researchers who are not yet corrupted by the "publish or perish" addiction.

    It was surprising to see how easy was to "stop collaborating". After all, everything we do for publishers, we do it for free. We give away our papers, we review for them for free, we cite for free, and... we then use portions of our funding to pay for high priced subscriptions. Sometimes to be able to read our own papers, as you clearly illustrated in your blog post.

    Once we stop collaborating with the Journals that do not serve the goals of the scientific community, we should then simply get behind the progressive Journals that are doing the right thing.

    * PLoS
    * BiomedCentral

    and the other tens of thousands of Open Access journals that are available today.

    The illusion of the "Impact Factor" is one of the most shameful chapters of scientific publishing history. An entire population of educated researchers make some of the most important decisions of their careers (where to publish their hard-worked papers) based on a number whose computation has never been reproduced by anybody outside of the company that produces "the number". For all we know "the number" can be computed by gerbils jumping on the keys of a random number generator circuit.

    Granted, there is a public disclaim of what the "Impact Factor" algorithm is, but, nobody have ever shown that the number is actually computed that way. Moreover, it has been shown in multiple instances that the supposed algorithm used to compute the "Impact Factor" is biased and strongly driven by a few papers with a large number of citations.

    The whole "Impact Factor" construct is actually an illustration of the decadent narcissism of scientific publishing:

    "The importance of a paper is measured
    by how many other papers cite it".

    There is no mention of the social impact,
    there is no notion of real "breakthroughs" in that field...

    Our scientific community work on the mislead notion that:

    "The purpose of scientific research is to publish papers".

    Any researcher who brings up the "Impact Factor" as the criterion used to choose where to publish, should be sent back home to re-read Karl Popper's : "Conjectures and Refutations" and "The Logic of Scientific Discovery". Hopefully that will refresh their knowledge of the scientific method, and the basic mechanisms that we use to gather trustworthy data.

    Journals are entirely constructed on the basis of reputation (what we as a collective think of them). It is time to re-calibrate our notions of what a "good" Journal is. Starting with "A good Journal must be":

    1) Open Access
    2) Encourage (if not enforce) reproducibility verification
    3) Agile (take less than 1 years to publish a paper)
    4) Stimulate two-way communication between authors and readers

    We could start by using a public "grading card" for Journals, based on public (and reproducible) measures such as the ones listed above.

    Reply
    1. Adrian Pohl

      The "Criteria for the journal of the future" by Daniel Mietchen, Heinz Pampel and Lambert Heller will be of interest to you: http://beyondthejournal.net/2011/06/20/criteria-for-the-journal-of-the-future/
      (Another act of scholarly communication that gets no academic credit and has "zero impact". ;-) )

      Peter, you probably know these criteria. It might make sense to collaborate with Daniel, Heinz and Lambert on further developing them.

      All the best
      Adrian

      Reply
    1. pm286 Post author

      I shall develop this theme. I shall try to get some publishers to argue why they should survive

      Reply
  2. Steven Bachrach

    I think what we want is barrier-free access to information. So that implies the two dimensions of free – libre and gratis.

    In terms of libre, I am totally with you. Scientific articles and data should come with no restrictions on their use. We should be able to to reuse data (and even text). We should be able to mine text and data.

    When it comes to gratis, here I am probably a bit more conservative than you. Making information available has a cost, and this cost needs to be carried by someone or some organization. OA tips in favor of the reader – the reader should bear no burden in paying the cost of information storage, archiving, and transmission. Rather, the author (which is the majority of OA journals) or some organization are covering the costs of operation. It is not apparent to me that this is the reasonable way of doing things. The recipient of the data and text (the reader) is accruing a great benefit from access to this information. Why should he/she be removed from its cost?

    The entire issue of cost is really in my mind much more complicated that what either side has been willing to really face up to. Publishers are acting to protect their revenue streams. Readers are upset and not being able to afford access to important journals. The subscription model shifts too far in favor of the publisher while the OA model shifts too far in the direction of the reader.

    I would really like to see us focus on the libre problem, where I think we can make significant progress and really effect a dramatic change in publishing.

    Reply
    1. pm286 Post author

      I have never argued that publishing is zero-cost. You are right that we need to explore different funding models. Although it is not a zero-sum game, most of the funding comes from the public purse or charities (and where it does not - companies, etc. this is small)

      Reply
  3. Steven Bachrach

    I know that you do not argue for zero-cost. In the states, I imagine that most of the funding for information comes out of the library budget. For state institutions, this is ultimately a public purse, but for private schools, it is not.

    But my point is that we should focus our efforts not on the cost issues, but on making scientific communication richer and a better servant to our needs. (as opposed to the reverse, which is often what is going on - how we as scientists constrain our communication to fit within the constraints of the journal).

    Reply
    1. Barbara Fister

      ... though if your library can't afford that wonderfully tricked-out servant, and the library is the provider, what then? And while (speaking from a US perspective) yes, many libraries are not publicly-funded, the majority of research libraries in the US are, and they are our information backbone - but these institutions are being defunded. We really need to develop funding models that don't depend on revenues through licensing the information to those who can pay, but rather pay for the thing we want to share up front. All of its costs, not just the creation of it, but the sharing of it.

      Great discussion - thanks!

      Reply
      1. pm286 Post author

        Thanks Barbara,
        I appreciate your argument. There are no easy answers, but I think the last 20 years have often reduced libraries to purchasing products in which they have had no involvement rather than helping to define what they want and how it should be funded. More later

        Reply
  4. Pingback: Science in the Open » Blog Archive » How to waste public money in one easy step…

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>