What’s wrong with scholarly publishing? The size of the problem

Posted on July 13, 2011 by pm286

In previous posts (/pmr/2011/07/11/what%E2%80%99s-wrong-with-scholarly-publishing-your-feedback-%E2%80%93-why-should-journals-exist/ and immediate backtracks) I have started to address the question of what is wrong with scholarly publishing. I haven’t actually established yet that there *is* anything wrong and I’ll do that in a day or two hence (symptoms and causes).

What is the size of the global scholarly research industry? What is the world GDP of academia? I have asked this question to many people without an answer. I’ll explain what I mean…

Money is given publicly to institutions (mainly universities but also local, national and international research institutions (STFC, CSIRO, national labs…) , including charities (e.g. Cancer Research UK)) to carry out research. I am restricting this to research work, not private contract work (e.g. work for hire that is unlikely to be published) and excluding teaching or other non-research activities. I also exclude work within for-profit companies (e.g. Glaxo Group Research (now GSK) where I used to do research). There is an expectation that this work will be “published” or “made public” – here I don’t address what this means – I shall later. The money is usually publicly accountable and may even be published. It includes funding to academia from for-profits where the contract is for “research” – this often means that the results are expected to be “published” and there is often a reduced overhead (fee) from the institution. (For example we have had funding from Microsoft and Unilever, some pharma and some for-profit publishers). The ethics of this is not in question here – I am simply establishing the scale. The point is that this “academic industry” – and such it is – is coupled to scholarly publishing in a bizarre manner, and one which I shall argue is deeply unhealthy.

So I am going to conflate terms and use “academia” to mean the institutions above. Companies (such as Glaxo and Microsoft) ultimately rely on sales and stock price for their measure of worth. Scholarly research increasingly relies on publication metrics.

So how large is academia? I find it very hard to get figures (and that is the value of a blog – I hope that some readers can help). I am happy if the figures are within half an order of magnitude – a factor of 3 either way.

I come from these directions:

When the Wellcome Trust fund research they allow about 1-2% for publishing. Scholarly publishing is about 10 billion /year (GBP, USD, Eur … the units are lost in the noise). So the associated research is 50-100 times higher => 500-1000 billion
The top universities (Cambridge, Stanford, Harvard) get about 500 million/year. There are probably about 10,000 academic institutions (with a long tail). Truncate the tail at 1000 and we might get 500 * 1000 => 500 billion

(There are limits – research is much greater than scholarly publishing and is less than the GDP of the planet). So let’s assume 500 billion.

That’s a large industry. Most industries of that size have developed an information infrastructure (e.g. for suppliers, for metrics, for government). Academia has not. Academia has let others produce information products which they then buy. Unlike some industries which regulate their information infrastructure (think supermarkets) academia lets others do this.

This has a cost – a serious cost. There is a direct cost in the information products. If we (i.e. academia) wish to get information on scholarly output (mainly scholarly publishing) we have to pay others for their information products. We have not designed these information products, nor – as far as I know, have we challenged their design and content – we take them as givens. But this (perhaps 10 billion) is not the major problem.

It gives rise to the much more serious cost – we make decisions based on information over which we have no control. The irony is that much of this basic information – the scholarly publications – is initially produced by us – in electronic form. Any competent industry would immediately use this information itself –in the overall picture it’s a tiny fraction of 500 billion (a concerted world-wide effort in academia would create at-source metrics for a few billion at most).

A feature of academia is that it is a Holy Roman Empire of thousands of players. Each tries to solve these problems by itself. In the UK every university has to create its own system for the upcoming REF (assessment exercise). Whether you think the REF is a good thing or not it seems certain that it does not compare to the competence that would be found in an industry. Yes, industry can foul up on IT and frequently does, but academia usually doesn’t even get started. Taking the axiom that the UK wishes to measure 100 institutions in the REF it seems extraordinarily inefficient to expect each to create its own information system.

The vacuum of a proper information infrastructure for the world-wide academic industry is exacerbated by the apparent need for every institution to compete aggressively against every other. In most industries this is tackled by mergers and acquisitions. When I worked in Glaxo, Richard Sykes (CEO, and then Rector of Imperial) argued that in most businesses the market leader was about 30%. And that in pharma the largest was 5 % (Glaxo). (So he went out and bought Wellcome). In universities I suspect the leader is about 0.1%. I am not saying universities should merge – I am arguing that because there is a plethora of competing institutions then the information infrastructure is archaic and exploited.

The malaise in scholarly publishing is directly of academia’s own making. We have failed to notice, let alone adjust, our own business processes with the results that others are doing it for us. And not in response to our needs but to what benefits their markets.

And in the holy market economy this is regarded as a good thing. The fault, dear Brutus, is that we have been sleeping for about 30 years and have not wakened to the fact that we are Gulliver-like tied and restricted. But if we work together on this we are vastly the largest player in the marketplace. In principle we can collectively shape or information infrastructure, especially scholarly publishing, to whatever we want.

It is not too late, but it is getting that way. I am always grateful for feedback. My next sortie, unless feedback takes me elsewhere, will be to examine the symptoms of the dystopia.

Posted in Uncategorized | 4 Comments

What’s wrong with scholarly publishing? Your feedback – Why should journals exist?

Posted on July 11, 2011 by pm286

One of the features of blogging is that you get immediate feedback – some positive, some not. ALL feedback is welcomed and will be treated professionally. In conventional scholarly publication we are expected to assemble other relevant work, prior art, conflicts, etc. The blog makes this easy. If I have omitted a significant opinion or piece of work then I am likely to be informed of this. Here’s an example – I’ll reproduce the first part as it is essentially a scholarly publication, but in the form of a blog post… (Daniel Mietchen is active in OKF and ran the Open Theses earlier this year.) Thanks to Adrian Pohl (Open Bibliographic Principles fame)…

Mietchen, Pampel & Heller: Criteria for the Journal of the Future

http://beyondthejournal.net/2011/06/20/criteria-for-the-journal-of-the-future/

The internet changes the communication spaces in which scientific discourse takes place. In this context, the format and role of the scientific journal are changing. This commentary introduces eight criteria that we deem relevant for the future of the scientific journal in the digital age.

Background

The debate on the future of scholarly communication takes place between researchers, librarians, publishers and other interested parties worldwide. Perhaps appropriate to the topic, the debate has seen relatively few contributions via traditional scholarly communication channels, whereas blog posts like “Is scientific publishing about to be disrupted?” by Michael Nielsen (2009) received a lot of attention.

In light of this debate, a discussion emerged during the Open Access Days 2009 between Lambert Heller and Heinz Pampel about the changing landscape of scholarly communication in the field of library and information science (LIS). In the following months, both discussed their views with different stakeholders, including the LIBREAS editors.

In autumn 2010, Heller and Pampel started beyondthejournal.net – a blog, in which they document their thoughts on the current system and on the future of scientific discourse in LIS. [1] As a result, they summarised their analysis in a paper (Heller & Pampel 2010) presented at the annual conference of the German Society for Information Science and Information Practice (DGI). The core of the work is a collection of eight criteria for the future of the scientific journal in LIS.

In connection with a conference talk by Daniel Mietchen on large-scale collaboration via web-based platforms at the conference “Digitale Wissenschaft 2010” in Cologne (Mietchen 2010a), Mietchen and Pampel discussed the possibility of a transition of the criteria in a general and interdisciplinary form.

In the following, Mietchen translated the criteria into English and started an editable copy thereof at Wikiversity, a wiki for the creation and use of free learning materials and activities (Mietchen 2010b).

After a further joint discussion, the following version [2] of the criteria was formulated, with contributions from other discussants at Friendfeed [3] and Wikiversity.

Criteria

Dynamics: Research is a process. The scientific journal of the future provides a platform for continuous and rapid publishing of workflows and other information pertaining to a research project, and for updating any such content by its original authors or collaboratively by relevant communities.

Scope: Data come in many different formats. The scientific journal of the future interoperates with databases and ontologies by way of open standards and concentrates itself on the contextualization of knowledge newly acquired through research, without limiting its scope in terms of topic or methodology.

Access: Free access to scientific knowledge, and permissions to re-use and re-purpose it, are an invaluable source for research, innovation and education. The scientific journal of the future provides legally and technically barrier-free access to its contents, along with clearly stated options for re-use and re-purposing.

Replicability: The open access to all relevant core elements of a publication facilitates the verification and subsequent re-use of published content. The scientific journal of the future requires the publication of detailed methodologies, including all data and code, that form the basis of any research project.

Review: The critical, transparent and impartial examination of information submitted by the professional community enhances the quality of publications. The scientific journal of the future supports post-publication peer review, and qualified reviews of submitted content shall always be made public.

Presentation: Digitization opens up new opportunities to provide content, such as through semantic and multimedia enrichment. The scientific journal of the future adheres to open Web standards and creates a framework in which the technological possibilities of the digital media can be exploited by authors, readers and machines alike, and content remains continuously linkable.

Transparency: Disclosure of conflicts of interest creates transparency. The scientific journal of the future promotes transparency by requiring its editorial board, the editors and the authors to disclose both existing and potential conflicts of interest with respect to a publication and to make explicit their contributions to any publication.

If you read the criteria they are fairly similar to mine two days ago (I think the first are concerned with the how? Rather than why?). Bjoern Brembs (whose talk at OKF has informed me on Impact Factors) has just posted:

July 11, 2011 at 12:28 pm (Edit)

I’m with you on journals needing to go extinct. The only reason they’re still around is history. So back to history they ought to go.

So I take MPH’s points on board but think they should be part of the “publication of the future”, not the journal.

So, champions of journals (I assume there are some) please let us have your arguments PRO journals. If the reasons are branding and competition, please say so. They will be given equal space on this blog.

Posted in Uncategorized | 1 Comment

What is wrong with Scientific Publishing: an illustrative “true” story

Posted on July 10, 2011 by pm286

Yesterday I abandoned my coding to write about scientific publishing:

/pmr/2011/07/09/what-is-wrong-with-scientific-publishing-and-can-we-put-it-right-before-it-is-too-late/

and I now have to continue in a hopefully logical, somewhat exploratory vein. I don’t have all the answers – I don’t even have all the questions – and writing these posts is taking me to new areas where I shall put forward half-formed ideas and await feedback (“peer-review”) from the community. The act of trying to express my ideas formally, for a critical audience, is helping to refine them. And I am hoping that where I am struggling for facts or prior scholarship that you will help. That’s not an excuse for laziness , it’s a realization that one person cannot address this problem by themselves.

This blog post *is* a scholarly publication. It addresses all the points that I feel are important – priority (not that this is critical), peer review, communication, re-use (if you want to), and archival (not perhaps formal, but this blog is sufficiently prominent that it gets cached. This may horrify librarians, but it’s good enough for me).

The only thing it doesn’t have is an ISI impact factor, and I’ll return to that. It does have measures of impact (Technorati, Feedburner, etc.) which measure readership and crawlership. (These are inaccurate – they recently dropped by a factor of 5 when the blog was renamed – I’d be interested to hear from anyone who cannot receive this blog for technical reasons (timeout, etc.)). Feedburner suggests that a few hundred people “read” this blog. There’s also Friendfeed (http://friendfeed.com/petermr ) where people (mainly well-aligned) comment and “like” posts; and Twitter where I have 650 followers (Glyn Moody has 10 times that) – a tweet linking to yesterday’s post has just appeared.

So the blog post fulfils the role of communication – two way communication – and has mechanisms for detecting and measuring this. As I write this I imagine the community for whom I am preparing these ideas and from whom I am hoping for feedback. Ambitiously I am hoping that this could become a communal activity – where there are several authors. (We do this all the time in the OKF – Etherpads, Wikis, etc.) And who knows, this document might end up as part of a Panton Paper. As you can tell I am somewhat enjoying this, though writing is often painful in itself.

I am going to describe new ideas (at least for me) about scholarly publishing. I am going to use “scholarly” as inclusive of “STM” and extending to other fields – because in many cases the translation is direct; where there are differences I will explicitly use STM. I like the word “scholarly” because it highlights the importance of the author (which is one of the current symptoms of the malaise – the commoditization of authorship). It also maps onto our ideas of ScholarlyHTML as one of the examples of how publication should be done.

Before my analysis I’ll give an example of the symptoms of the dystopia. This has reinforced me in my determination never to publish my ideas in a traditional “paper” for a conventional journal. Details are slightly hazy. I was invited – I think in 2007 – to write an article as part of an Issue on the progress of Open Access. Here it is

http://www.sciencedirect.com/science/article/pii/S009879130800004X

Serials Review
Volume 34, Issue 1, March 2008, Pages 52-64

Open Data in Science

Peter Murray-Rust^a^,

^aMurray-Rust is Reader in Molecular Informatics, Unilever Centre for Molecular Sciences Informatics, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK

It will cost you 40 USD to rent it for ONE DAY. You are allowed to print it for personal use during this period.

*I* cannot read my own article and I do not have a copy.

The whole submission process was Gormenghastian and I have ended up being embittered by it. I asked for the article to be Open Access (Green) and believed that it would be available indefinitely so that I would not have to take a “PDF copy” (which is why I don’t have one). When I discovered that I could not read my own article I contacted the publishers and was told that I had agreed to it being Open for a year after which it would be closed. Maybe – I don’t remember this but there were 100+ emails and it may have slipped my unconscious mind. If I had been conscious of it, I would never have acquiesced. It’s a bizarre condition – let people read something and then cut them off for ever. It has no place in “scholarly communication” – more in the burning of the libraries.

I took the invitation as an exciting opportunity to develop new ideas and to get feedback, so I wrote to the author (whom I only know throw email) and explained my ideas. (If I appear critical of her anywhere it is because I am critical of the whole system). I explained that “Open data” was an exciting topic where text was inadequate and to show this I would create an interactive paper (a datument) with intelligent objects. It would give readers an opportunity to see the potential and challenges of data. This was agreed and I would deliver my manuscript as HTML. I also started the conversation on Openness of the resulting article. The only positive thing was that I established that I could post my pre-submission manuscript independently of Elsevier. (I cannot do this with publishers such as the American Chemical Society – they would immediate refuse the article). I decided to deposit it in “Nature Precedings” – an imaginative gratis service from NPG. http://precedings.nature.com/documents/1526/version/1 . This manuscript still exists and you can copy it under CC-BY and do whatever you want with it. (No, there is no interactive HTML for reasons we’ll come on to).

I put a LOT of work into the manuscript. The images that you see are mainly interactive (applets, SVG, etc.). Making sure they all work is hard. And, I’ll admit, I was late on the deadline. But I finally got it all together and mailed it off.

Disallowed. It wasn’t *.doc. Of course it wasn’t *.DOC, it was interactive HTML. The Elsevier publication process refused to allow anything except DOC. In a rush, therefore,

I destroyed my work so it could be “published”

I deleted all the applets, SVG, etc. and put an emasculated version into the system and turned to my “day” job – chemical informatics – where I am at least partially in control of my own output.

I have never heard anything more. I got no reviews (I think the editor accepted it asis). I have no idea whether I got proofs. The paper was published along with 7 others some months later. I have never read the other papers, and it would now cost me 320 USD to read them (including mine). There is an editorial (1-2 pages which also costs 40 USD). I have never read it, so I have no idea whether the editor had any comments.

Why have I never read any of these papers? Because this is a non-communication process. If I have to wait months for something to happen I forget. *I* am not going to click on Serials Review masthead every day watching to see whether my paper has got “printed”. So the process guarantees a large degree of obscurity.

Have I had any informal feedback? Someone reading the article and mailing me?

No.

Has anyone read the article? (I include the editor). I have no idea. There are no figures for readership.

Has anyone cited the article?

YES – four people have cited the article! And I don’t have to pay to see the citation metadata :

http://www.scopus.com/results/citedbyresults.url?sort=plf-f&cite=2-s2.0-43149086423&src=s&imp=t&sid=Mt3luOQ49JTT7H5OHiBim3F%3a140&sot=cite&sdt=a&sl=0&origin=inward&txGid=Mt3luOQ49JTT7H5OHiBim3F%3a13

The dereferenced metadata (I am probably breaking copyright) is

1 Moving beyond sharing vs. withholding to understand how scientists share data through large-scale, open access databases
Akmon, D. 2011 ACM International Conference Proceeding Series , pp. 634-635 0

2 Advances in structure elucidation of small molecules using mass spectrometry
Kind, T., Fiehn, O. 2010 Bioanalytical Reviews 2 (1), pp. 23-60 2

3 An introduction to data mining
Apostolakis, J. 2010 Structure and Bonding 134 0

4 Data mining in organic crystallography
Hofmann, D.W.M. 2010 Structure and Bonding 134 0

I cannot read 3 of these (well it would cost ca 70 USD just to see what the authors said), but #2 is Open. Thank you Thomas (I imagine you had to pay to allow me to read it) [Thomas and I know each other well in cyberspace]. It is clear that you have read my article – or enough for your purposes. Thomas writes

that once data and exchange standards are established, no

human interaction is needed anymore to collect spectral

data [525]. The CrystalEye project (http://wwmm.ch.cam.

ac.uk/crystaleye/) shows that the aggregation of crystal

structures can be totally robotized using modern web

technologies. The only requirement is that the spectral data

must be available under open-data licenses (http://www.

opendefinition.org/) [544].

The other three may have read it (two are crystallography publications) or they may simply have copied the reference. It’s interesting (not unusual) to see that the citations are 2 years post publication).

So in summary, the conventional publication system consists of:

Author expends a great deal of effort to create manuscript
Publisher “publishes it through an inhuman mechanistic process; no useful feedback is given
Publisher ensures that no-one can read the work unless…
University libraries pay a large sum (probably thousands of dollars/annum each) to allow “free” access to an extremely small number of people (those in rich universities perhaps 0.0001% of the literate world – how many of you can read these articles sitting where you are?)
No one actually reads it

In any terms this is dysfunctional – a hacked off author, who has probably upset an academic editor, and who have jointly ensured that the work is read by almost no-one. Can anyone give me a reason why “Serials Review” should not be closed down and something better put in its place? And this goes for zillions of other journals.

Hang on, I’ve forgotten the holy impact factor… (http://www.elsevier.com/wps/find/journaldescription.cws_home/620213/description )

Impact Factor: 0.707

Yup, roughly the square root of a half.

What will my colleagues say?

My academic colleagues (will unfortunately) say that I should not publish in journals with an IF of less than (??) 5.0 (J Cheminfo is about 3). That in itself is an appalling indictment – they should be saying “Open data is an important scholarly topic – you make some good points about A,B,C and I have built on them; You get X, Y wrong and you have completely failed to pay attention to Z.”

My Open Knowledge and Blue Obelisk colleagues will say – “this is a great start to understanding and defining Open Data”.

And I can point to feedback from the gratis Nature Precedings: (http://precedings.nature.com/documents/1526/version/1 )

This has:

11 votes (whatever that means, but it probably means at least 11 people have glanced at the paper)
A useful and insightful comment
And cited by 13 (Google) (Scopus was only 4). These are not-self citations.

So from N=1 I conclude:

Closed access kills scholarly communication
Conventional publication is dysfunctional

If I had relied on journals like Serials Review to develop the ideas of Open Data we would have got nowhere.

In fact the discussion , the creativity, the formalism has come through creating a Wikipedia page on “Open data” and inviting comment. Google “Open Data” and you’ll find http://en.wikipedia.org/wiki/Open_data at the top. Google “Open data in science” (http://www.google.co.uk/search?q=open+data+in+science&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a ) and the gratis manuscript comes top (The Elsevier article is nowhere to be seen).

As a result of all this open activity I and other have helped to create the Panton Principles (http://pantonprinciples.org/ ). As you will have guessed by now I get no academic credit for this – and my colleagues will regard this as a waste of time for a chemist to be involved in. For me it’s true scholarship , for them it has zero “impact”.

In closing I should make it clear that Open Access in its formal sense is only a small advance. More people can read “it”, but “it” is an outdated, twentieth century object. It’s outlived its time. The value of Wikipedia and Nature Precedings for me is that this has enabled a communal journey. It’s an n<->n communication process rooted in the current century.

Unless “journals” change their nature (I shall explore this and I think the most valuable thing is for them to disappear completely) then the tectonic plates in scholarly publishing will create an earthquake.

So this *is* a scholarly publication – it hasn’t ended up where I intended to go, but it’s a useful first draft. Not quite sure what – perhaps a Panton paper? And if my academic colleagues think it’s a waste, that is their problem and, unfortunately, our problem.

[And yes – I know publishers read this blog. The Marketing director of the RSC rang me up on Friday as a result of my earlier post. So please comment.]

[OH, AND IN CASE THERE IS ANYONE ON THE PLANET WHO DOESN’T KNOW – I DON’T GET PAID A CENT FOR THIS ARTICLE. I DON’T GET REIMBURSEMENT FOR MATERIALS. I DON’T KNOW WHETHER THE EDITOR GETS PAID. THE JOURNAL TAKES ALL THE MONEY FOR “PUBLISHIING MY WORK”].

Posted in Uncategorized | 14 Comments

What is wrong with Scientific Publishing and can we put it right before it is too late?

Posted on July 9, 2011 by pm286

I sat down today to write code and and found that I couldn’t – I had to write about science publishing, so here goes. I intend this will be the first of several posts. I often blog in forceful style (rant?) but here will try to be as objective as possible. I’d like to start a discussion and engage responsible STM publishers. I’d like to see if we can define what the basis of publishing is. Why? And how?

But I am going to start with a strong assertion. STM publishing is seriously broken and getting worse. It is being driven by forces largely outwith the directing influence of the scientific community (although not necessarily outwith their ultimate control). This is manifested by activities which have nothing (in my view) to do with science, and I will explain that.

A brief topical aside. Non-UK readers may not realize the enormity of what has happened in the UK and what the lesson is for scientific publishers. The News Of the World – a popular UK newspaper – broke the law repeatedly by phone-hacking of victims of crime. Public outrage exploded and with 24 hours a 150-year old newspaper had ceased to be. That is the power of the masses – it is too rarely exercised – but when it happens it can be unstoppable. The “public” had existed in a cosy, if unpleasant, symbiosis with the publisher, eagerly demanding new salacious material and paying for it. But when the newspaper overstepped … a bang, not a whimper! There were no discussions, no slow decline. A week ago there were the usual rumblings, but no one predicted this – at least in public. The power of the crowd in a media-literate society is frighteningly rapid. The same fate can await complacency in STM.

That is the potential power that the scientific and academic community has over scholarly publishers. (In this post I am going to restrict discussion to serials publishers in STM). I’ll state the simple premise:

Unless the process of scientific publication is rapidly and effectively revised there will be a catastrophic crash. It will be unpredictable in both its timing, speed and nature. It will destroy some of the current participants. It will change parts of the scientific process and will change academia.

I have no special knowledge so that’s a Cassandra-like statement (although I have no wish to play that role). I am surprised how few of my general colleagues (e.g. not the OKF) share my concerns about the state of STM publishing. They do not realise the dystopia we are already in and its apparently inexorable progress.

Before you switch off from this analysis, I intend to offer constructive dialogue to all parties. I know publishers read this blog (I was rung up yesterday by the Marketing Director of the RSC in response to yesterday’s blog.) I wish, honestly and constructively to analyse, the benefits that STM publishers can provide. Some of them do provide good services to science, but I find it difficult to see value from many others. They have the chance, if they wish to answer some (I hope) objective questions.

Similarly I have been critical of academic libraries, but do not see them as the cause. They should have altered us earlier to problems instead of acquiescing to so much of the dystopia. They are part, but only part, of the solution.

I have therefore come, perhaps belatedly, to the conclusion that the crisis is of our (academia’s) making. I used to blame the publishers and I still can and will when appropriate. (The manufacture and sale of fake journals is inexcusable – as bad as Murdoch’s phone hacking). But the publishers are a symptom of our disease, not the cause. Cassius says:

Men at some time are masters of their fates:
The fault, dear Brutus, is not in our stars,
But in ourselves, that we are underlings.

The academic system (in which I include public funders) has, by default, given away a significant part of its decision-making to the publishing industry. (I use “industry” to include non-profits such as learned societies, and like all industries there are extremes of good and bad practices). This gifting has been done gradually, over about 2 decades, without any conscious decisions by academia, and without – in the beginning – any conscious strategy from the publishers. The gifts have all been oneway – from academia to industry, which has grown in both wealth and power at the expense of academia. In effect academia has unconsciously stood by, dreaming, during the creation of a 10 billion USD industry, almost all of whose revenues come from academia, frequently to their detriment. Like Morbius in Forbidden Planet we have created our own monsters.

So I will start with some axioms, on which future posts may build. If we can all agree then this serves as a basis for future decision making

Science and scientists have a need and a duty to publish their work.
Funders rightly and increasingly require this in a formal manner.
This work should be available to everyone on the planet. Ideally the costs incurred in doing so should be invisible to the reader.
The purpose of publication in whatever degree of formality is:

To establish priority of the work
To communicate the work to any who wishes to consume it
To offer the work for formal and informal peer-review and to respond to discourse
To allow the work to be repeated, especially for falsifiability
To allow the work to be built on by others
To preserve the work

I’d like to formalize this list – it’s a first draft and I want to make sure we haven’t omitted anything. I’d also like to know from any party, especially a publisher, if they disagree. There are publishers, for example, who believe that part of the process of publication is to restrict access.

I will say again; let us be careful because this rather enticing statement that everybody should be able to see everything could lead to chaos. Speak to people in the medical profession, and they will say the last thing they want are people who may have illnesses reading this information, marching into surgeries and asking things. We need to be careful with this very, very high-level information. (Dr John Jarvis, Senior Vice President, Europe, Managing Director, Wiley Europe Limited) examined by Ian Gibsons select Committee in the House of Commons, Westminster, UK, 2004-03-01) (http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/4030102.htm

I hope that 7 years have removed this attitude.

The historical purposes of publication did not include bibliometric evaluation of the publication as a means of assessing scientists or institutions. This is the monster we have allowed to be born and which we must now control. I do not believe it should be part of the formal reasons for publication. And if it retreats to informality we should take formal steps to control it.

So I’d be grateful for reactions, in the comments section. I will not edit and will attempt to keep comments objective.

Posted in Uncategorized | 20 Comments

PLoS One, Text-mining, Metrics and Bats

Posted on July 8, 2011 by pm286

Just heard that PLoS One was awarded Innovator of the Year by SPARC:

http://blogs.plos.org/everyone/2011/06/30/plos-one-wins-recognition-as-a-sparc-innovator/

I applaud them personally as the 4 Pantonistas were given the same award last year for the Panton Principles.

So Lezan, collaborators at NaCTEM and I have published our first article in PLoS:

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0020181

Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

Top of Form

Bottom of Form

BalaKrishna Kolluru¹ ^*, Lezan Hawizy², Peter Murray-Rust², Junichi Tsujii¹, Sophia Ananiadou¹

For those who don’t know, PLoS One publishes competent science. Not trendy science, not stuff that the marketeers think will sell the journal. Just plain competent science. Simply:

We have said we have done X,Y,Z
It is in scope for the journal (a lof of science is out of scope for PLoS one
The referees agreed that we had done X,Y,Z competently. No “this isn’t interesting”, “not sufficient impact”.

To be honest it took an AWFUL long time to get it reviewed. SIX MONTHs (look at the dates) to get two referees opinions. I doubt this is specific to PLoS, it’s the fundamental problem of refereeing, to which there is no good answer.

Anyway it has been out for a few weeks? What does the world think of it? Well it has been out about 6 weeks and had 316 downloads. That’s’ exciting to young scientists. 300 people have clicked on their article. (Maybe they haven’t READ it, but at least it’s an indication). And another of Lezan’s papers has got a “Highly accessed” in J. Chem Informatics (http://www.jcheminf.com/content/3/1/17)

Accesses
475 Research article
ChemicalTagger: A tool for semantic text-mining in chemistry
Lezan Hawizy, David M Jessop, Nico Adams, Peter Murray-Rust
Journal of Cheminformatics 2011, 3:17 (16 May 2011)
[Abstract] [Full Text] [PDF] [PubMed] [Related articles]

Well I am not a great fan of metrics of any sort. We have ahd 300,000 downloads of our software (official Microsoft figures) and we get zero credit. But at least we have a few hundred downloaders. So is 300 good? Impossible to say, but I’ll have a little fun with metrics:

Lets’ go to PLoS on May 27 and see the other article downloads. They’re 512, 322, 511, 295, 458, 493, 398 … So Lezan and Bala are within the range. Good, competent, science. (Text-mining science cannot be trendy because if we actually try to do it we’ll be sued by the published for mining “their” content – it is deeply depressing to be prevented from doing science by lawyers).

So what’s the sort of access for a highly accessed article? Go to http://www.plosone.org/home.action and “most viewed” and there are articles 1 week old with several thousand views. What the record? This one about bats:

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007595

It’s just under 2 years and had nearly 200,000 accesses. If it were a physical journal it would have fallen apart.

It’s had about 2 citations, which shows how stupid these metrics are

“download” it and see why it’s popular. You might even read it (I did, briefly)

But of course that will distort the metrics. Open access encourages people to READ articles. Whereas articles are actually only meant to be cited, not read.

Posted in Uncategorized | Leave a comment

Impact Factor Spam

Posted on July 8, 2011 by pm286

I received the following unsolicited email (slightly curtailed) from the Royal Society of Chemistry:

Dear Dr Murray Rust

Quality is the focus at RSC Publishing: the recently published 2010 Journal Citation Reports ® prove that our quality is better than ever. And that is thanks to our authors and referees.

Our average impact factor (IF) now stands at 5.5. It’s an impressive figure, especially when compared with the average for a chemistry journal* of 2.54.

But if you’re thinking that there’s nothing special about this, as most chemistry publishers are celebrating an overall rise in their impact factors, think again. RSC Publishing figures have risen by 63% since 2003 – almost double the average rise.

Of the top 20 journals in the multidisciplinary chemistry category, six are from RSC Publishing. No other publisher has more.

83% of our journals listed in this year’s report have an IF above 3. No other publisher can boast such a large proportion of titles at this level, demonstrating just how well-cited our entire portfolio truly is.

(Data based on 2010 Journal Citation Reports ®, (Thomson Reuters, 2011).

I have two concerns – one with the impact factor (see below) and one with the RSC’s use of bulk unsolicited email (SPAM). Dealing with the second first:

A European Directive (http://en.wikipedia.org/wiki/Privacy_and_Electronic_Communications_%28EC_Directive%29_Regulations_2003) makes it clear that the RSC’s activity is illegal:

One of the key points of this legislation is that it is unlawful to send someone direct marketing who has not specifically granted permission (via an opt-in agreement). Organisations cannot merely add peoples details to their marketing database and offer an opt out after they have started sending direct marketing. For this reason the regulations offer more consumer protection from direct marketing.

I will be interested to hear from them why they have broken this directive and why I should not report them. I am not on any of their mailing lists and this type of mail wastes my time and fills up my mailbox. Even if it turns out that there is a legal loophole it is unethical to waste scientists time in this manner. But it was the RSC itself which opined that Open Access publishing was “ethically flawed” – have they ever retracted that opinion formally?

The main issue however is general – the growing and mindless use of Impact Factors and some measure of “quality”. There are many reasons why IFs are frequently meaningless (http://en.wikipedia.org/wiki/Impact_factor#Editorial_policies_which_alter_the_impact_factor ). Bjorn Brembs at #okcon2011 gave us a presentation showing how IFs were fundamentally flawed and how publishers could negotiate to get them adjusted favourably (see http://www.slideshare.net/brembs/whats-wrong-with-scholarly-publishing-today-ii – this is an old sldeshow and if Bjorn reads this maybe he can update anything). Objectively I see the following:

There is no objective definition of what a citation is. As far as I can see it’s a mixture of what the closed commercial indexing organization thinks it is and the negotiating publisher. If we are going down the mindless citation route then at least we need Open Citations. But if we extract lists of bibliographic references (citations) from publications then we will be sued by the publishers. So citations are whatever the powerful forces in the publishing industry want them to be.
IFs are per journal. This about as meaningful a measure of worth as deciding that a person is well-dressed because they shop at a given store. You can be badly dressed with expensive cloths and vice versa. And the worth of academic publications is about as hard to measure as style. It’s what we collectively think, not what we write in citation lists. The journal is an outdated concept in the current world – it exists only to brand publications (its use as a collection for disseminating a subject is disappearing). It’s like saying “X is a good blogger because their blog is hosted by Y and lots of people read Y”. No, people say “X writes good blog posts”. There’s enough technology in the world that we can have per-author metrics, but it won’t suit the publishers because then we shall evaluate science by the worth of individuals rather than the strength of the marketing department of a money-making institution. And that’s anathema to the publishers.

The sad thing is that young people have now been terrified by the Impact and H factors, and I can’t give them much hope. When I published my first paper in 1967 (J. Chem. Soc. (now the RSC), Chemical Communications) I did it because I had a piece of science I was excited about and wanted to tell the world about. That ethos has gone. It’s now “I have to publish X first author-papers in Y journals with impact factors great than Z”.

I can’t see how to change that other than by disruptive action in the publishing world. When I have fully worked out what that is I will start doing it and persuading other people to do it. Hopefully it will be legal. If not I shall be prepared to take the consequences.

Posted in Uncategorized | 15 Comments

Open Scholarship means Better Science

Posted on July 7, 2011 by pm286

Four years ago [1] Open Access publishing was described by some members of the publishing community as “junk science”, the implication being that Open Access led inexorably to lower standards of (or even no) peer-review. I now assert that, from my perspective, Open Scholarship means better science, and invites readers to confirm or challenge this view. I use Open Scholarship to mean at least OABCD = Open Access, Bibliography, Citations and Data).

I now generally only publish in open Access journals – by this I mean “gold” Open Access where the author or funder pays for each submission and all articles in the journal are open. IOW I can go to any BMC, or PLoS journal and know that all the articles can be downloaded , read and re-used without further permission. Moreover I can build robots that systematically read every article in a journal. For example our robots have downloaded and understood 10,000 (sic) articles from the International Union of Crystallography’s Acta Crystallographic E.

The systematic retrieval and analysis of articles is critical to modern science. It is only possible with “gold”. It is forbidden by contract to use machines to read subscriptions to many major publishers, who – dog-in-the-manger-like – stop us innovating but do none themselves – these publishers do a huge disservice to science for the benefit of their shareholders and CEO’s incomes. The hybrid journals (where some articles are Open Access) and useless for systematic study as it is impossible to know which articles can be used for which purpose. The use of “green” publishing (where authors self-archive in repositories or web pages) is irrelevant as it is impossible to discover these publications systematically. [For example it is impossible to answer the question “find me all green-published articles in synthetic chemistry”].

I shall return to this in future posts – in this one I outline the virtues of Open Access publishing of single articles. I shall first put the counter-arguments (that closed access publishing is superior to open Access). They are the *only* arguments in favour of Closed access publishing:

Closed access publications have a higher standard of peer-review and editing. It is the only argument that could lead to the denigration of Open Access – that more bad science was allowed through the review system. I know of no objective evidence for this, other than proof by assertion.
Closed access publications have higher impact factors. This is difficult to measure. There may be some historical hysteresis in the system. Since, as I understand it, there is no objective measure of the impact factor (Bjorn Brembs outlined at OKCon2011 that how IF’s were calculated was a matter for private negotiation for each journal). Certainly the decimal points on IFs are ludicrous. IFs are, in any case, one of the worst measures of value. But the assertion may, for the moment, be true. Does this lead to better science? It is difficult to see how. It could conceivably lead to better targeted funding (“let’s only fund science reported in high-prestige journals”, but I know of no funders that take this approach – and rightly so.
Only closed access publishers can sustain the economy of science publishing. There seems little likelihood of Open Access publishers suddenly crashing from the marketplace. The new Wellcome/MaxPlanck/HHMI publication(s) will ensure very high-quality Open Access publishing.

So there is no clear benefit to *science* in choosing closed access. (There may be a benefit to individual scientists). I imagine that Closed access publishing has higher costs than Open access as it has to employ police to detect and chase people “stealing content” – I have no idea what percentage this is. Otherwise costs should be independent or closed/open and depend only on efficiency and whether the organization is for profit. These are difficult to argue.

So now the advantages to Open Access. I have submitted 11 papers or mine and 4 others to BMC, a “gold” publisher. Here are my immediate benefits:

I can (and have) posted all the manuscripts into our DSpace so all of you can read them before review. (http://www.dspace.cam.ac.uk/handle/1810/238409 ). This has the benefits:

The manuscript gains earlier priority. I can claim 2011-07-05 as the deposition date for Open Bibliography in STM (http://www.dspace.cam.ac.uk/handle/1810/238406 ). If no-one else has published an article called “Open Bibliography” then we can claim priority.
The manuscripts gets feedback. We have already got feedback on one, where the author of one of the programs we have used has pointed out that we have used an early version of the program. As a result we have re-run the analysis with the latest version and improved the result. This would not normally come out of most closed reviewing.
We advertise the work we have done, even before the papers are published.

There is nothing absolutely novel in this – it’s what happens in arXiv – but there are publishers in chemistry which would immediately reject the manuscript unread if there was an already exposed version on the web. There is no *scientific* reason for this, only that it protects the closed access publishers’ business.

When the articles are published many more people *can* read them than if they were in closed access journals. This is incontrovertible – “all” is greater than “some”. The closed access argument is, I assume, that because closed access journals have (in their eyes) greater prestige, then more people will read them. I remain to be convinced, and would need firm evidence.
People can re-use my material without permission. The most valuable re-use is probably analysis and indexing by robots.
People can detect whether my work is valid. The more people read an article, the more likely errors are to be detected. Closed access has many fewer people who can detect errors.

If the academic world, in its inward-looking and self-congratulatory manner, continues to build self-perpetuating reward systems by promoting “high impact” brands, there will be an increasing clash between the need to develop science Openly and closed publication. But the branding system by itself does nothing to promote better science.

So the equation is simple. IF there is no difference in the quality of service provided by closed and open access science, then there is no intrinsic differential benefit from the publishing process. Open access is then better because more people can read the science and there can be much more re-use.

It is up to the closed access publishers to make an objective case why they provide a better service. They are welcome to post that case here.

[1] Four years ago a group of publishers, through their association AAP, launched PRISM – an effort to position closed access publishing as high quality and open access as leading to a number of evils , communally referred to as “junk science”. They hired a consultant who was well known for discrediting people and organizations (http://www.nature.com/nature/journal/v445/n7126/full/445347a.html ).

The consultant advised them to focus on simple messages, such as “Public access equals government censorship”. He hinted that the publishers should attempt to equate traditional publishing models with peer review, and “paint a picture of what the world would look like without peer-reviewed articles”.

And

Dezenhall [the consultant] noted that if the other side is on the defensive, it doesn’t matter if they can discredit your statements, [Susan Spilka, Wiley’s director of corporate communications] added: “Media messaging is not the same as intellectual debate.

PRISM seems no longer an issue (its last news item was in 2007 http://www.prismcoalition.org/ ).

Posted in Uncategorized | 3 Comments

Why Openness Matters to me and to you: The Architecture of Access to Scientific Knowledge

Posted on July 6, 2011 by pm286

Last week Michael Gurstein attended OKCon2011 in Berlin and wrote a blogpost

http://gurstein.wordpress.com/2011/07/03/are-the-open-data-warriors-fighting-for-robin-hood-or-the-sheriff-some-reflections-on-okcon-2011-and-the-emerging-data-divide/

which was critical of OKCon and/or OKF (not sure which). It upset some of my colleagues but frankly bewildered me – despite reading the debate on his blog. He seems to have picked up (and probably amplified) a thread of subculture which I don’t recognize even if I look for it. Here’s an example:

“these World of Warcraft warriors off on a joust with various governmental dragons.” …

“I see a huge disconnect between the idealism and the passionate belief in the rightness of their cause and the profound failure to have any clear idea of what precisely that cause is and where it is likely to take them (and us) in the very near future.”

I do not recognize me and my collaborators in this description.

Strangely (to me) all the initial comment was highly favourable. Jordan Hatcher replied making it clear EXACTLY what openness is and pointing Michael to the Open Knowledge Definition. A definition which has emerged from the mainstream of Open Source thinking and practice.

The Open Definition, linked to from the OKFN homepage and discussed during at least some of the sessions of OKCon, defines _exactly_ what “openness” means in the context of the Open Knowledge Foundation: http://www.opendefinition.org/

Of course you shouldn’t confuse one organisation with an entire movement. Other organisations and individuals, including those that presented their views at the conference, may feel differently about openness and what it means.

Inside your post you mention two good examples of goals that someone (a government, an academic, or an NGO, etc) may want to achieve with making data more accessible:

1) More political participation by currently underrepresented groups
2) Participation by those without technical skills or other access to technology

I’ll generally sum these up as saying using data to help bridge the digital divide.(PMR’s emphasis)

This is a great end goal, however in order for a anyone looking to help solve digital divide issues by building technical tools — or even non-technical tools — if those tools involve data, they will need:

1) access to the data
AND
2) legal rights to use and reuse the data

Michael, were you are Glyn Moody’s (on OKF advisory board) opening keynote? Where he highlighted the threat we face from all forms of monopoly and closed practice? Or Richard Stallman (whom I missed because I was running a session on Open Science)? Or Brewster Kayle (who built the Internet Archive) whom I also missed because I was running a session on Open access to Bibliographic data – a struggle which matters critically. These represent the thinkers that the OKF wishes to learn from.

Jordan and I’m on the advisory board so you can now have more examples of the sort of things that OKF cares about.

Tim Hubbard, head of Bioinformatics at the Sanger Centre UK (“where part of the human genome was sequenced”). There was a titanic struggle for Openness over the genome. It could have become commercial. Where only the rich and powerful could access genomic information. Tim and many others have battled for over a decade to keep genomic information free. OUR information. And we need to OKF as a centre to exchange practices, ideas, meet people, etc. Genomic information matters. Without it we have impoverished science and medicine.

Jo Walsh, (EDINA, a publicly supported informatics resource at the University of Edinburgh). “I helped to run a Public Geodata campaign with OKF support back in 2005-6. This focused deliberately on “state-collected” data in response to a bit of European law.” Jo, and others like her, work out all the aspects of making geodata serve the world community – semantics, coordinate systems, licences, practices, etc. I recently submmited a grant application with Jo. This is mainstream, publicly funded, research infrastructural work.

And me. You ask me…

I’ld like to have some clarity from you/the OKF as to whether they see “Open” data/knowledge as a “public” or a “private good” in the terms pointed to by Parminder Jeet Singh in an earlier comment to this blogpost? By this I mean is “openness” as you folks interpret it a characteristic to be enjoyed (or “consumed”) by an individual in his private capacity (based on his individual means for accessing and making use of the “open” data/knowledge etc.) OR is “openness” something that is to be enjoyed (or “consumed”) by the “public” in which case in addition to ensuring the “openness” of the data/knowledge etc. there is an obligation to ensure that the conditions and pre-conditions for such broad based public enjoyment and use are also associated with the open data/knowledge etc.

I’m an academic at Cambridge, UK and I am funded by the public purse (JISC) to work with OKF. We have two projects. JISCOpenbib which in a year has created the technology , the protocols, the practice and the licences to make 30 million bibliographic records Open. We’ve developed a new, lightweight universal approach to managing bibliographic references. We’ve published this Openly in an Open Access journal so that everyone including you can read it. It is awaiting peer-review but because it’s Open we can post the manuscript and I’d ask you to read it – or at least parts. If you don’t understand it or think it’s badly written or simply wrong let us know. See http://www.dspace.cam.ac.uk/handle/1810/238394 . The work would not have happened without OKF, the paper would not have been written and the very cool visualizations are an example of the sort of thing that is mainstream for OKF. (The other project OSID is funded by UK government (BIS) and is looking at the outcomes of mainstream research funding – what effect does grant XY12345 have on understanding climate change.

Science is currently for the privileged few in rich universities. If a citizen wants access to research information they have to pay the publishers. See the first 10 minutes of Larry Lessig’s talk at CERN http://vimeo.com/22633948 where he shows that the top 10 papers about his child’s illness would cost 500 USD to read (rented for 2 days only). He calls it:

The Architecture of Access to Scientific Knowledge

I am one of many designing and building that architecture. That’s an example of what the OKF is for me.

Posted in Uncategorized | 8 Comments

Publications from the “Visions of a Semantic (Molecular) Future” Symposium

Posted on July 4, 2011 by pm286

As I blogged, we’ve just submitted 15 papers to the Journal of Cheminformatics – and we got them off last Tuesday. Here’s the evidence

The ones at the bottom right are the invited talks – Cameron Neylon, Henry Rzepa, Dan Zaharevitz and John Wilbanks. (The ones at centre bottom are to-be-written elsewhere/elsewhen).

We (or rather Charlotte) have been putting them into a collection in our DSpace (http://www.dspace.cam.ac.uk/handle/1810/238340 ). For the record it’s slightly less than 10 minutes per paper (though it would be more when you are fresh to the system), BUT:

We also plan to add the raw text (*.doc, *.html or *.tex)
And material from the VSMF event
And a linking splash page explain everything

Why are we doing this?

It’s a record of the work
YOU can read the manuscripts, even if they aren’t accepted (this is not a shoe-in – they are peer-reviewed – we’ve not had reports back yet)
You can make comments.
You can re-use them (the material is CC-BY though it is difficult to show this in DSpace which is covered with the ritual “by default you can’t do anything”)
It gives our work extra prominence. Our informal study of theses in the group (N=3) is that they get extra-special Google-karma. (Haven’t checked Bing!)
It gives the Department and University extra karma, if they want it
It gives advertisement for the Journal (on whose editorial board I sit – no it’s not a shoe-in)
It gives something to show our funders before formal publication (yes, we’ve been working). And if we’ve missed you from acknowledgements let us know
It gives us a warm-fuzzy altruistic feel.

I’ll be posting about these articles in the coming days. But if you want to know what makes me and my colleagues tick here’s “Semantic science and its communication – a personal view” http://www.dspace.cam.ac.uk/handle/1810/238391

Posted in Uncategorized | 7 Comments

The Open Knowledge Foundation builds its Organizational DNA #okcon2011 #jiscopenbib

Posted on July 4, 2011 by pm286

I’ve just come back from 4 wonderful days in Berlin at OKCon 2011. About 400 people in the historic Kalkscheune just off FriedrichStrasse. People of all ages, many cultures and countries – some “old hands” , many having their first contact with OKF people. An incredible feeling of warmth and belonging.

With Adrian Pohl (met IRL for first time) and Mark McGillivray we ran a 90 min sessions on Open Bibliography. Working out where we were, where we might go – as a world community. Looking to see who could bring contacts, content, technology. Because that is much of what fuels the OKFN.

And of course ideas. But ideas without implementation often don’t spread. So a key aspect of OKFN is hacking – in the beneficial sense. If you want Open Bibliography you don’t just tell people it would be a good thing, you write software, you transform content, you wrangle licences, you build implementable protocols, you reach out to the wider community. Open Bibliography has all those things and more. And it would not have happened but for the OKF.

I don’t have any photos of the OB workshop (?does anyone?) –

The mind-expanding session for me was Nina Paley (http://blog.ninapaley.com/ ). PLAY THE ANIMATION – it’s a minute. I was sitting at the back of the room hacking and idly listening to talks and then was riveted by Nina. She’s a creative artist in the area of cartoons, book, quilts, etc. And EVERYTHING is CC-BY or CC0. She’s strongly against CC-NC anywhere and now so am I. Here’s Nina showing how the four freedoms SHOULD translate to culture (they don’t normally):

The take-away from this is that the OKF is a real melting pot of ideas, rooted in serious practical action. Governments take OKF seriously. JISC takes OKF seriously. Science Publishers take OKF seriously. Foundations take OKF seriously. It’s a dynamic, responsible yet flexible and highly innovative organization.

When Rufus asked me to be on OKF Advisory Board in (ca) 2005 I though his plan to accrete Knowledge in Knowledge Forge was unrealistic. But I thought (somewhat arrogantly) that I might be able to give some guidance, and agreed to be on the Board. It’s one of the best decisions I have made.

Because the Advisory Board is part of the “Organizational DNA” of the OKF.

About 20 of us met on Saturday to build the Organizational DNA. Here’s Lucy Chambers, Rufus Pollock and Jenny Molloy

The organizational DNA is the thing that keeps an organization going , and in the right direction. It’s the touchstone that we come back to when it’s not clear what we should be doing. Should we expand? How? Do we partner with X? What are the limits? Etc. Religions and nations have organizational DNA. We heard how Amnesty International has clear organizational DNA and transcription/translation.

It’s harder for evolving organizations and even harder in cyberspace. Here are my examples of cyber Organizations with clear DNA.

Wikipedia. It exists to create an Open Encyclopedia rooted in democratic and meritocratic action. Millions of people contribute to WP because the OD is so clear.
Blue Obelisk http://www.blueobelisk.org . Very simple – perhaps little more than a virus. No membership, no money, no meetings. Simply Open Data, Open Source, Open Standards in Chemistry. That is enough to keep us going and we’ll have more on this blog
Advocacy groups. Open Rights Group. Quadrature du Net. Etc.

For me the genetic code of the OKF is the Open Definition (http://www.opendefinition.org/ ). Everything has to fit onto that.

The Open Knowledge Definition (OKD) sets out principles to define ‘openness’ in knowledge – that’s any kind of content or data ‘from sonnets to statistics, genes to geodata’. The definition can be summed up in the statement that “A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.”.

Everything is based on that. Break it and the system falls apart. Just like ACGT is central to DNA.

But DNA is more than the syntax. And that’s the hard bit. There are no maps for guaranteeing success in designing organizations. There are success stories and failures. Rapid growth can be good and bad. Diversity can be good and bad. Pluralism and decentralization has merits and problems.

The OKF has been wrestling with these for some years. It will continue to do so. But we are now a positive force in the World, with many clear missions. We trust each other enough to spread activities. Open Bibliography is pushing ahead without requiring tight central control. Science, through open-science, Panton, etc. has its own aggregation of people and activities. The great thing about OKF is how we pick up really keen, focussed, volunteers. As long as it stays like that we have few worries. I’ll just highlight the enormous contribution that Jenny has made in cohering science in the OKF. That Adrian Pohl has done in Open Bibliography, others like Daniel Mietchen who is organising events in this area.

And Daniel Dietrich and his team who created a wonderful OKCON2011. As a program committee member I marvel at what they achieved. Thank you.

Posted in Uncategorized | Leave a comment

petermr's blog