NC or not-NC

NC or not-NC – the future of this blog awaits your verdict.
There have been recent comments suggesting that as I am advocating CC-BY (and not CC-NC) for Open Access papers, then this blog should also use CC-BY. CC-BY is the Creative Commons license that allows any re-use of the material as long as it is acknowledged; CC-NC additionally forbids re-use for commercial purposes.
My current position is logical – a blog is not an Open Access paper and there is no reason why it should carry an Open Access license. However we anticpated this when we set up the blogs and wrote:

Welcome! – September 1st, 2006
We welcome anyone as a poster but require them to register (to
prevent spam). We honour copyright, but ask that posters make [their]
contributions available under Creative Commons. This allows the posters
to retain their moral rights, but allows us to re-use the blog
(including their contributions) for other purposes if required (e.g. it
might be revised for supporting information, tutorials, etc.) We will
always attribute posters.

we debated as to whether we should require all contributors to  use CC-BY but initially decided that it was a big enough step to ask them to make their contributions available under CC of any sort. There has been no comment on this – I do not know that any potential contributor has failed to post because of the license and the possibility that their contribution could be re-used and resold without their permission.
Remember that re-use may be in ways we haven’t thought of. As a trivial example suppose someone develops a haiku-sniffing robot and runs it over this post. They will discover:
NC or not-NC
the future of this blog
awaits your verdict.
They can now write a book on “hidden haikus” using this material. It may not make any money but at least they can try.
Personally I have not (and never have had) a problem with making the blog CC-BY. So I appeal to readers – do any of you object to your posts being licensed as CC-BY? If I don’t hear objections we’ll change the license in 2 weeks.
Of course the previous contributions are still CC-NC. You can use all of mine as CC-BY, but I am not going to stick license on all 400.

Posted in Uncategorized | 4 Comments

Open Access – Reply to Springer

In a recent post Springer – I resign from your Journal – July 8th, 2007 I criticized Springer for their failure to implement their Open Choice according to their promise to authors. I thank Jan Velterop for his speedy and long reply (in the comments section). Since the subject is a very serious and very immediate one I’m taking his reply and commenting on it.
As background I should point out that the motivation is to review the value of the label “Open Access” for data-driven science – the ability for a machine to read the literature and extract facts from it. I and colleagues are comprising a list of all Open Access chemistry journals and the rights that humans and robots have to re-use the information in them. (We intend to submit our findings to an Open Access journal). As a side product we may investigate “Open Access” publications from Closed Access publishers (variously “Open Choice”, “Online Open”, “Open Access”) which require substantial payment by the authors. (At this stage I am not concerned about whether this is good or bad, simply as to whether the authors and readers get what they are entitled to).
Springer was among the first of the major Closed publishers to announce such and offering and to suggest that this would be an important milestone in Open Access. As such they have a special commitment to do whatever they do properly, including clarity to all concerned.
Bill Hooker puts it well:

1. Free Is Not Open! and Jan, as a signatory to Budapest, knows or should know this.2. The article you linked does not appear in PubMed Central and the PubMed entry does not link to the freely readable version. In addition, neither the pdf nor html version available from the journal website is labeled in any way that indicates Open, or even free, access. This is a tremendous oversight, since anyone finding the article through PubMed will not know that it is available to read, and anyone finding the article at the journal site (e.g. via Google Scholar) will not know what permissions obtain. As Peter Suber has argued, the absence of a clear and conspicuous label cripples OA. [PMR’s emphasis] This is especially so when not even PubMed knows there is a free version!
3. That “request permissions” button has no place on an Open Access article! OA by definition means that you know what permissions obtain: all of them!
4. The article contains the following claim:

Electronic Supplementary Material Supplementary material is available for this article at http://dx.doi.org/10.1007/s00894-005-0041-7 and is accessible for authorized users.

but that DOI does not resolve. Argh!

My additional concerns were:

  1. That the authors were not getting what they paid for – the ability to retain copyright and offer full Open Access to everyone
  2. That Springer made no effort to indicate, let alone promote, that the paper was an Open Access one.

For example if you visit the pages by putting the DOI 10.1007/s00894-005-0041-7 into Google you get:

[PDF]

Valence isomerization of 2-phospha-4-silabicyclo[1.1.0]butane: a

File Format: PDF/Adobe Acrobat
1081 HV Amsterdam, The Netherlands. E-mail: lammert@chem.vu.nl. Fax: +31-20-5987488. J Mol Model (2006) 12: 531–536. DOI 10.1007/s00894-005-0041-7
www.springerlink.com/index/KP82270286744820.pdf – Similar pagesNote this

which, if followed, leads to a page with only a minute bar , hardly readable, indicating that the paper is Open Choice (and no explanation or hyperlink). There is no ALT tag so an unsighted person would have no idea that the paper is Open in any way. Indeed there is a larger option via the shopping cart encouraging the reader to purchase it for 32 USD. This is a disservice to the authors. I think that it is not unreasonable to use “false pretences” for offering to sell a free item. (I do not care to do the experiment but I am sure if I followed this I would be charged).
I now return to Jan’s reply. I do not wish this to get acrimonious – and I enjoyed my visit to Springer at Jan’s invitation last year. However the public face of Springer pages give no indication that they bhave put any effort into Open Choice – and this is 2 years down the line. By contrast Open Access journals like PLoS, BMC and Chemistry Central are very simple and very clear. They don’t find it difficult to add author’s copyright and license to papers – nor indeed do Blackwell (I shall blog about them later).
Jan writes:

It is more than a little ironic and sad that the integrity of the one large publisher who is trying to move open access forward is put into doubt, to say it mildly. Is everything perfect? I would be the first to admit that more work has to be done. But does that call for being put in the stocks and having ‘false pretences’, ‘not caring a green fig’, or the rotten eggs of a trashed integrity thrown at you?

PMR: I have made it clear why I used terms such as lack of care and false pretences. They may be harsh and they may be unfair to individuals but they are accurate about the outward face of the publisher. And that is what we should go on. There is virtually no public effort in promoting the Open Access concept.

First I want to clear up a basic misunderstanding, that open access equals authors keeping copyright. Of course, authors *can* keep their copyright, but *any* copyright holder can make an article open access, and this *includes* the publisher. Having open access articles with the publisher’s copyright is not an oxymoron in the slightest. In fact, as far as Springer Open Choice articles go, it may well still be the majority, and I will explain that in a moment. But open access articles with the publisher’s copyright line on them are *no less* open access than those with the author’s copyright. They are labelled ‘Open Choice’ in SpringerLink (and in their metadata)

PMR: where is this metadata? I didn’t see any on the Abstract page. I agree that in principle a publisher could make a paper Open Access if it added a license such as CC-BY. (But who would be the “BY” – the publisher?). But without an explicit license the paper is not by default Open Access according to the full definition.

and they can be used for any non-commercial purpose, according to what Creative Commons licence (the non-commercial one; the one you also use for your blog)

PMR: I don’t label my blog as “full open access”. The BY-NC was to encourage readers to post without fear of having their material sold. However we have been reconsidering changing to BY

and our web site stipulate. This is wholly in line with the definition of open access in the Bethesda Declaration (http://www.earlham.edu/~peters/fos/bethesda.htm#definition).

PMR: On following this I get:

An Open Access Publication[1] is one that meets the following two conditions:

  1. The author(s) and copyright holder(s) grant(s) to all users a free, irrevocable, worldwide, perpetual right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship[2], as well as the right to make small numbers of printed copies for their personal use.
  2. A complete version of the work and all supplemental materials, including a copy of the permission as stated above, in a suitable standard electronic format is deposited immediately upon initial publication in at least one online repository that is supported by an academic institution, scholarly society, government agency, or other well-established organization that seeks to enable open access, unrestricted distribution, interoperability, and long-term archiving (for the biomedical sciences, PubMed Central is such a repository).

PMR: I see nothing here that forbids commercial use (as Springer does). There are many commercial uses other than printing copies, such as creating databases, intelligent software, music, etc.

The ‘permission’ button is a general one that appears on any article and that goes off to RightsLink. It is a flaw that clicking the button didn’t tell you that permissions for non-commercial use of Open Choice articles are not needed, and there were some issues with RightsLink distinguishing the Open Choice articles. Those issues, if they haven’t been solved completely yet, are certainly recognised and will be resolved soon.)

PMR: I note this, but this operation is going on for 2 years old. In any case the publishers consistently tell us about the added value and how much this costs to do properly. I let the community judge whether you have taken a professional approach here.

There are reasons for not showing an author copyright line on all the Springer Open Choice articles. First of all, the choice is often taken rather late by authors, and the article has already been produced and published when the decision is taken. This is probably a consequence of the fact that we only allow the choice being made after an article has gone through peer-review and has been accepted for publication. This is a way of avoiding that any knowledge that an article would be paid for could lead to inappropriate acceptance decisions.

PMR: Other publishers such as Blackwell don’t have a problem here. But there is no reason to add your copyright when you are not entitled to. Or have you already insisted on a transfer of copyright? In which case this is hardly promoting Open Access.

Secondly, not all Open Choice articles have been paid for by authors or their funders, as we have decided ourselves to make article Open Choice in order to track downloads and citations and compare those to other, non-open articles.

PMR: You have a larger number of articles which are labelled green and which are readable toll-free but are not labelled Open Choice. This is mislabelling – it is not Open Choice it is publisher donation of toll-free reading.

Thirdly, we have made a few arrangements with universities whereby articles from their researchers have been made Open Choice retrospectively. Just last week, for instance, a thousand or so articles from The Netherlands have been made Open Choice, as a result of just such an arrangement (which provides for Open Choice for all articles from Dutch universities to be Open Choice). Which other major publisher has done anything of the sort?

PMR: I am only commenting on what is meant by Open Access here – not on the business decisions for offering it or not.

That said, we do recognise that where authors themselves opt for Open Choice, the copyright line should carry their name, and we are working hard to make sure the procedures we have in place for all of our journals are more robust in making sure that the choice can indeed be made in time for the production of the article, yet after the peer-review and editorial decision have taken place. That procedure is currently being tested, and will be rolled out as soon as the test results are satisfactory.

PMR: Thank you

As for articles being deposited in PubMed Central, we habitually do that if Open Choice is ordered and paid for, and if it is an article that falls within the scope of PMC (obviously civil engineering is not, for instance). Any Open Choice articles, also the ‘complimentary’ ones, can always be deposited in any repository, including PMC. The article you use as an example may very well be one that we, as publishers, decided to make Open Choice for the reasons mentioned above. The authors are perfectly free to deposit the article, including the published PDF, in PMC. The articles with the label Open Choice’ are fully open, are most definitely *not* “being wrapped in rights management tools to control their use.”

PMR: That is how they appear to the casual reader. The publishers’ rights are promoted and the reader is encouraged to pay for the article.

With best wishes, and the hope that you can levy any future criticism in a constructive spirit, without wrapping it in doubts about my — and the company’s — integrity.

PMR: I hope the answers above are fair, if unflattering. I leave unanswered whether hybrid access schemes promote OA or not. But they have to be responsibly implemented.
I am sorry to have got bogged down in this – when reviewing access to data in Open Access papers I had expected that actual access to data was possible and that the question was whether the license explicitly allowed re-use. But there is no license associated with your material – only a copyright statement – and even if the copyright is still retained by the authors there is no automatic right of re-use. I hope that Springer can devote more effort to promoting Open Access and Open Data.

Posted in Uncategorized | 5 Comments

Springer – I resign from your Journal

Till today I was a member of the editorial board of Journal of Molecular Modeling · Computational Chemistry – Life Sciences – Advanced Materials – New Methods – published by Springer. It wasn’t very onerous – I occasionally got a mail from the editor to comment on a submitted paper – and I was loyal enough to publish a paper in it two years ago. And I and a co-author were considering publishing another one and because we believe in Open Access wished to do this under the Springer Open Choice system. Here’s what it offers according to its Architect, Jan Velterop (excerpt):

Springer Open Choice: Open Access Publishing
In this model the author also submits a manuscript for peer review, in exactly the
same way as in the traditional system. However, when the article is accepted for
publication, the author does not transfer copyright, but, instead, arranges for
payment of a so-called article processing fee, which defrays the publisher’s costs
and as a result the article will be published with immediate and permanent full
open access on line. At Springer there will also be a traditional printed version,
which follows the usual subscription model, albeit with much reduced costs, as the
only costs that need to be covered for open access articles are those of the actual
paper, printing, handling and postage.

and “More information and details can be found on this web site”

Springer Open Choice™
Your Research. Your Choice.
Springer operates a program called Springer Open Choice. It offers authors to have their journal articles made available with full open access in exchange for payment of a basic fee (‘article processing charge’).
With Springer Open Choice the authors decide how their articles are published in the leading and well respected journals that Springer publishes. Springer continues to offer the traditional publishing model, but for the growing number of researchers who want open access, Springer journals offer the option to have articles made available with open access, free to anyone, any time, and anywhere in the world. If authors choose open access in the Springer Open Choice program, they will not be required to transfer their copyright to Springer, either.
well, the “basic fee” is USD 3000 which immediately deterred my co-author (also an editor) as he is an independent scientist of renown but is not supported by an institution. So I looked to see what the authors were getting – at least they would get “full open access” – and I could rely on this since Jan Velterop is a signatory of the Budapest Declaration.
So I went to JMolMod to search for any Open Choice articles. Springer provides no advertisement for Open Choice – you cannot search for (“all Open Choice articles”). So you have to browse journal-by-journal (and I have to say that like all other publishers sites it is very badly laid out). Anyway I found the overall TOC. This advertises
  • Access to all content
  • Access to some content
  • Access to no content
Note: no mention of Open Choice. It seems that
  • almost issues are white (toll-only). This was to be expected.
  • a very few issues are openly readable (toll-free) and these carry the full green square. They carry no “Open Choice” or other indication of rights and are usually either a Jan-1 issues and/or a special issue. There is no indication as to whether they will remain toll-free.
  • a very very few issues are white+green (access to some content). I went to that issues and found two “Open choice ” articles. At least someone is paying for the privilege.
But I was APPALLED to find that the article was copyrighted by Springer. No mention of Open Choice, No mention of Open Access. All you get for your 3000 USD. And the bottom link points to:

Permissions Request To request reuse of content from this Springer Science+Business Media journal, please e-mail Springer Rights & Permissions directly at permissions.heidelberg@springer.com for assistance.This journal is not currently supported for reuse licensing through Rightslink.Please include content information available on SpringerLink.com (article title, author, date, issn, volume, issue), your request details, your contact information, and a link to the content on SpringerLink if available.
To purchase or view a PDF of this article, please close this window and select “add to shopping cart”.
Close Window

Copyright © 2007 Copyright Clearance Center, Inc. All Rights Reserved.
Comments? We would like to hear from you. E-mail us at customercare@copyright.com
so it is absolutely clear that Springer has no intention of actually making this article Open Access even by their own “Your Research. Your Choice” promise, let alone the BOAI. Even if Springer honoured their commitment to the author, their words are not BOAI-compliant as they forbid commercial use.This isn’t a glitch in the technical editing. I have looked for other “Open Choice” articles and none of them have copyright attributed to the authors. Since I cannot assume many authors would consciously pay 3000 USD and then hand over their copyright I assume they have copyrighted material that doesn’t belong to them. (Or, of course, the Open Access label doesn’t actually refer to an Open Access purchase). There’s even one journal where every article has a full green sticker and just one has Open Access. The authors of that one have no more rights than their TOC neighbours. The only difference is that they get 200 pixels saying “Open Choice”. That’s not much for 3000 USD.
So I’m using this blog to resign from the editorial board of JMolMod. I cannot be associated with such practices.
The best that can be said is that Springer don’t care a green fig about Open Choice – they clearly have made no effort to implement it with the care that is required. That’s certainly the impression that most of the large publishers give – they want to be able to say “we offered this choice but hardly anyone wanted to take it up”.
If Springer care about it they should give all the authors their money back. I think they have destroyed the idea of Open Choice for the whole publishing industry. It doesn’t matter what the details were – they have blatantly failed to deliver “full open access” and they have taken a lot of money for it.
[I have given enough links for any readers to play the game of “hunt the Open Choice and find the copyrights”. It isn’t easy. I may have got some things wrong in the struggle. But not the lack of “full open access”].

Posted in open issues | 10 Comments

What's so wonderful about citations?

Peter Suber reports:

20:34 06/07/2007, Peter Suber, Open Access News
Iain Hrynaszkiewicz, Open access article on consensus definition of acute renal failure has been accessed more than 100,000 times, BioMed Central blog, July 6, 2007. Hrynaszkiewicz is BMC’s in-house Editor of Critical Care. Excerpt:

The most highly accessed article on BioMed Central’s most viewed articles page recently surpassed 100,000 accesses.
Bellomo et al.’s article, published in Critical Care in 2004, presented the first consensus definition of acute renal failure and followed a two day conference of the Acute Dialysis Quality Initiative (ADQI) Group. It has been cited more than 90 times according to both Google Scholar and Scopus.
These impressive access and impact statistics demonstrate the effectiveness with which important research articles can be disseminated, thanks to the wide-reaching visibility achieved by open access. Evidence continues to accumulate that open access research has an advantage in terms of being rapidly read and widely cited by peers….

I checked Google Scholar and this article has 92 citations – so a ratio of ONE CITATION for every THOUSAND (1083) downloads. I think we can be reasonably sure the most of the downloads are genuine (and not robots). (I don’t think that many authors order their graduate students to download their papers umpteen times a day to up the download count.) The very fact that the metric-weenies don’t count downloads would suggest that the download metric is genuine.
So how about some confirmatory evidence? Well, I was a minor co-author on an important BMC article this year. Two weeks ago we were told we had got 6000 downloads. In 4 months. Wow! So we should have 6 citations. Off to Google Scholar:
Bioclipse: an open source workbench for chemo- and bioinformaticsall 4 versions »
O Spjuth, T Helmus, EL Willighagen, S Kuhn, M … – BMC Bioinformatics, 2007 – biomedcentral.com
Bioclipse: An open source workbench for chemo- and bioinformatics Page 2. Bioclipse:
An open source workbench for chemo- and bioin- formatics
Cited by 1Related ArticlesView as HTMLWeb Search

only ONE. A ratio of SIX THOUSAND downloads per citation. So if we average the numbers we get somewhere around 1115 downloads per citation. That makes me feel better on those low citation counts for some of my papers.
Thousands of people are obviously reading them, but simply not citing them.
Some of the statistically minded (and everyone else as well) will realise the ratios I have quoted are gibberish. Of course. So are citations. And almost everything else. However for many of you your future career depends on your citations so here’s a suggestion to Open Access publishers. Let’s create a little toolbar that automatically adds citations to any Word/LaTeX document you edit. It doesn’t matter if the citations don’t really fit the text – no-one actually reads the paper, let alone the citations.  Some mutual backscratching could easily enhance the citations count. Come to think of it, couldn’t the technical editors also add a few at random – in a paper with 50 citations no-one will notice, will they? And in any case a citation doesn’t mean the paper is a good one. In one paper (Closed access so I won’t point to it) I referred to several papers whose supplemental data was scientifically disgraceful (the worse hamburger PDF you will ever see). But it will have boosted several peoples’ citation counts!
Note, of course, that you can only do this exercise with publishers which announce download counts. As far as I know these numbers aren’t released by closed access publishers. (I can’t think why).
I’m not saying there are better ways – there probably aren’t. If we make downloads a metric, then people will try to distort them, just But let’s not take this as seriously as we do.
Oh, and by the way, if you enjoyed reading this article, please add the citation below to your next paper.

Bioclipse: an open source workbench for chemo- and bioinformatics
Ola Spjuth*, Tobias Helmus, Egon L Willighagen, Stefan Kuhn,
Martin Eklund, Johannes Wagener, Peter Murray-Rust,
Christoph Steinbeck and Jarl ES Wikberg
BMC Bioinformatics 2007, 8:59 doi:10.1186/1471-2105-8-59

no-one will know whether it’s relevant or not. And, if you feel guilty, just download Bioclipse anyway. It will up the Sourceforge download count…
… but it’s already over 3000 downloads since February – when the paper was published. Now that figure THREE THOUSAND is one I DO believe in.

Posted in data, open issues | 9 Comments

Is Citation Extortion practised?

At ETD2007 one of the delegates related and experience with a publisher – I never got the details. She had submitted a manuscript and been told by the publisher (or editor) that it would not be published unless she included at least two citations to papers published by that publisher.
I have never encountered this myself but I have no reason to doubt her account.  The only purpose of the practice is to increase the perceived standing (and hence monetary value) of the publisher. It is immoral and totally unacceptable. I’d be interested to hear whether this is more than an isolated case (though presumably routine for that particular publisher/journal).

Posted in open issues | Leave a comment

The Comprehensive Knowledge Archive Network (CKAN) – Open Knowledge Foundation

Rufus Pollock is a tireless campaigner for Openness. He is a graduate student at Cambridge – “writing up”, but still with enormous energy for other activities in the area of Openness. He is a highly competent hacker – and promotes hackerdom locally with meetings in pubs and cafes – which is why the CKAN (below) has echoes of TeX, Perl, Sourceforge, etc. He has set up:

The Open Knowledge Foundation exists [to promote] the openness of knowledge in all its forms, in the belief that freer access to information will have far-reaching social and commercial benefits. In particular, we

  • Promote the idea of open knowledge, for example by running a series of forums.

  • Instigate and support projects related to the creation and distribution of open knowledge.

  • Campaign against restrictions, both legal and non-legal, on open knowledge. See the Open Knowledge Trail to learn more

He did me the honour of inviting me to be on the advisory board – I have done little, except that my main contribution has been to act as a foil for his debate. He has now announced:

After a year of (off and on) development we are delighted today to announce the official launch of the Comprehensive Knowledge Archive Network (CKAN for short): http://www.ckan.net/.
CKAN is a registry of open knowledge packages and projects — be that a set of Shakespeare’s works, a global population density database, the voting records of MPs, or 30 years of US patents.
CKAN is the place to search for open knowledge resources as well as register your own. Those familiar with freshmeat (a registry of open source software), CPAN (Perl) or PyPI (python package index) can think of CKAN as providing an analogous service for open knowledge.
CKAN is a key part of our long-term roadmap and completes our work on the first layer of open knowledge tools:

CKAN links in especially closely with our recent discussions of componentization: we envision a future in which open knowledge is provided in a much more componentized form (packages) so as to facilitate greater reuse and recombination similar to what occurs with software today (see the recent XTech presentation for more details). For this to occur we need to make it much easier for people to share, find, download, and ‘plug into’ the open knowledge packages that are produced. An essential first step in achieving this is to have a metadata registry where people can register their work and where relevant metadata (both structured and unstructured) can be gradually added over time.
We also make no bones that fact what we have is present is very simple, certainly when compared to the long-term vision — after all, we should remember it has taken software over thirty years to reach its present level of sophistication. Thus, rather than attempting to pre-judge the solution to open knowledge componentisation question (for example in the choice of metadata attached to each package), this beta version is the simplest possible thing that will provide value, and we look to user feedback (and we include ourselves here as users) to determine the future direction of development of the system.

FAQ

What kinds of things do you expect people to register in CKAN?

Anything and everything — when we say knowledge we mean any kind of content, data or information. That said there are two main recommendations regarding what you register:

  • First, we are looking for people to register ‘packages’ that is collections with some kind of structure rather than individual items. So a substantial set of photos, a datasets of all kinds, the writings of Shakespeare but not an individual blog, or your flickr photo collection (unless it is really big!).
  • Second, we’re looking for stuff that’s open: that is material that people are free to use, reuse and redistribute without restriction (other than, perhaps, a requirement to share-alike).

Why Not Just Use the Creative Commons Search Facility in Google/Yahoo/etc

Two main reasons:

  1. We focus on work that is open. Simply put the set of open work and the set of CC-licensed works are not identical because (a) not all Creative Commons licensed work is open (for example those which use the non-commercial provision are not) and (b) there are plenty of open works which do not use CC licenses (e.g. Wikipedia)
  2. The registry is designed to support holding much more metadata than simply whether the work is open on not. In particular we want to be able to support automated installation of knowledge packages in the future (which requires things like dependency and version information).

Is CKAN itself open?

Of course, both the code that CKAN runs on and the data itself is open, see the license page: http://www.ckan.net/license/.

How Can I Get Involved

Start enter things into CKAN and editing existing entries — you don’t need to be the developer of a particular project or resource to enter it into the registry.
If you want to get more deeply involved join the okfn-discuss list and and introduce yourself or just drop an email to info [at] okfn [dot] org. If you want to just start hacking with the code see our development project page (then follow the links to subversion):
http://www.knowledgeforge.net/project/ckan/

So how will this actually work out? My answer is that I have no idea and cannot have at this stage. It certainly shouldn’t be a “dumping ground” for unstructured information that will dilute and pollute the idea. My ideas (and I haven’t discussed them with Rufus – I’m meeting him at lunch) are that it should (at least initially) attract those types of knowledge objects that:
  • do not have a natural home elsewhere. (No point in repositing bioinformatics or astronomy data).
  • have good surface structure. It is important that visitors can immediately see what the objects are and how to navigate them
  • have an obvious virtue in being open – if objects were hidden or closed it would be a serious disadvantage to a community. The community need not be large, but it should have coherence.
  • Promote the idea of openness. “Gosh, I never new we could get information on MPs – perhaps we can also get information on…”
  • Have some sense of maintenance. Not dump and forget.
I’m not worried about discovery – the web searches of today (with their petatriple stores) will find things if they are labelled and exposed. (I’m a believer in lowercase semantic web – if Rufus tags the Shakespeare with “shakespeare” and “open” and “okfn” the tagbots will find it. Good use of FOAF, revyu, DBpedia, etc. would enhance many entries.)
Will I put my scientific data and articles in CKAN? Probably not. That’s not because they aren’t Open (except for those with Closed publishers) but because they are catered for by Intitutional Repositories. Science also IMHO needs domain repositories and I have been advocating this. My source code goes in Sourceforge.
What would I put? Probably well structured information on my locality – won’t say more here. The idea would be to catalyse others to do the same.
Although in some areas (e.g. Shakespeare) CKAN can aspire to be comprehensive in others it may hold exemplars (“proof of concept”) which are ready for scale-up. Whether that scale-up takes place in CKAN or seeds YAOS (yet another Open site) none of us can tell.
[And please WordPress preserve Rufus’ material unlike what you did to my last post.]
Posted in "virtual communities", open issues | Leave a comment

Where the scientific mind is without fear (Totally Retrosynthetic)

This is the title of Totally Restrosynthetic’s lastest post Where the scientific mind is without fear.

The post of Peter Murray-Rust about my totallyretrosynthetic blog made me to do this post though I have been very busy lately. (I am wrapping up of my work before I say good bye to the 3 years tenure here, writing three manuscripts, searching for a new position and processing of my GC-everything with September, the last month in the current lab, upon me.)I just thought of updating you all with the bitter out come of my latest experimentation with semi-open science before it is too late. My experience with it often reminded me the prayer from Gitanjali written by Rabindranath Tagore [Indian poet, philosopher, and Nobel laureate] which has been slightly modified here to echo my feelings.
Where the scientific mind is without fear and the head is held high;
Where scientific knowledge is free;
Where the scientific world has not been broken up into fragments by narrow boundaries;
Where the results come out from the depth of true experimentation;
Where tireless striving stretches its arms towards perfection;
Where the clear stream of reason has not lost its way into the dreary desert sand of dead habit;
Where the scientific mind is led forward by thee into ever-widening thought and action-
Into that heaven of freedom, my Father, let my scientific world awake.Many people contacted me for the login details for it, [PMR: this is the idea of doing collaborative synthetic chemistry across the globe] but only a few were ready to reveal their real details. I understand their curiosity but I did not get their intentions of not revealing their true identity. Though I have shared the idea with many of them, I have learned and have been cautioned by fellow bloggers that the ideas are more at risk to plagiarism in the semi-open access since the ideas are no longer public, they are not indexed by search engines and people in the blogosphere will not know the details. By giving out selective access to people we could be making it easier for them to copy the things without being found out. So, I am glad that I got to learn the reality of the world. I sincerely hope that it is only a small fraction of our field and as Chembark points out there are many more things to love the chemistry as such; I must also tell you that I did have luck in finding some body that is willing to put the idea in to the flask i.e., the underlying objective has been served and as soon as we get some ground work done for the same, we will be updating here; you can be part of the excitement and anguish of the total synthesis.
As Peter Murray-Rust has highlighted very clearly, we need more support and we have miles to go before we sleep. This is just a first step towards it.

Whatever you do is going to be a long hard road. Few people will be immediately receptive to new ideas – especially in chemistry which is one of the most conservative subjects in science – witness resistance to Open Access. You have to get used to people ignoring you. You need to have your ideas carefully thought out and logically compelling and a clear plan of action – woolly ideas won’t survive.
As to how you get it off the ground – closed discussions or open Web – you will have to decide. Many collaborative Open Source activities occur as a result of work already going on over the Internet. If you publish Openly – and particularly if you save it on a stable site – e.g. an Institutional Repository – the scientific record will be there. But you may make it difficult to publish in conventional journals – and that’s a decision you have to take. The future may have different models of publication – I hope it has – but we haven’t seen many yet – they are mainly based on existing publishers and their models.

Posted in Uncategorized | Leave a comment

FINO – Free is Not Open

Bill Hooker of Open Reading Frame has yet again and very clearly expounded the difference between Free and Open.
FINO = Free is Not Open
What follows may look like the same old arguments. It isn’t! The difference is that an increasing number of publishers are publicly and clearly promoting the value of Openness and clearly distinguish it from Freedom. THEY are quoting the Budapest and other declarations of Openness. If you don’t understand the importance, please read this carefully.
Bill:

Once more unto the breach, dear friends, once more: the dreaded Free Is Not Open argument rears its ugly head again. I’ve made my position (indeed free != Open, and the distinction matters) clear elsewhere, and was gratified recently to find PMR agreeing; now it seems that the Open Medicine editorial team takes the same position:

The Canadian Medical Association Journal (CMAJ) has just published:

Here is our response:
Although the endorsement by CMAJ‘s editors of open access medical publishing is welcome, we would like to take this opportunity to clarify several points raised in their commentary.1 First, there is an important distinction between open versus free-access publication. Open Medicine has not only adopted the principle of free access, that is, making content fully available online, but endorses the definition of open access publication drafted by the Bethesda Meeting on Open Access Publishing. This definition stipulates that the copyright holder grants to all users a free, irrevocable, worldwide, perpetual right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute works derived from the original work, in any digital medium for any responsible purpose, subject to proper attribution of authorship. Given that CMAJ holds copyright and charges reprint and permission fees, it is not in fact an open access journal.

PMR: The last phrase is exactly the point “it is not in fact an open access journal.” The important thing is that this is a publisher making it crystal clear what the distinction is.

In comparison, Open Medicine does not assume the copyright of our authors’ work. We believe that it is only fair and just that authors retain the ownership of their work; as such, Open Medicine does not charge reprint or permission fees, and our work is available for reproduction for educational and teaching purposes without copyright limitations or charges. We use a Creative Commons Copyright License that also ensures derivative works are available through an open access forum. It is through this creative and unlimited use of published material, with due attribution, that we believe scientific discourse can flourish. This truly open access forum also has a contribution to make to a journal’s integrity, independence, and freedom. […]

Chris Surridge of PLoS also agrees, and supplies an excellent analogy:

Free Access to scientific research is great, and all publishers who make their content free to read should be praised for doing so. But this is not Open Access. It is like giving a child a Lego car and telling them that they can look at it, perhaps touch it, but certainly not take it apart and make an aeroplane from it. The full potential of the work cannot be realised.

Where the OM team refer to Bethesda, Chris links to Berlin and goes on to enumerate

…the four unmistakable marks by which you may know, wheresoever you go, the warranted genuine Open Access publication:1. Content is made freely and immediately accessible to all.
This basically means that you can get it on the internet without paying anything in addition to what it costs you to access the internet.
2. Authors retain the rights of attribution.
So the work is the authors [‘ property]. The author doesn’t sign over the copyright to the publisher or anyone else. Rather the author allows the publisher to publish the work under licence. A licence which also ensures that:
3. Content can be distributed and reused without restriction.
So I or anyone else can take Open Access content and use it, in whole or in part, for any purpose including purposes that have not yet been dreamt of as long as I don’t infringe the Authors rights of attribution.
4. Papers are deposited in a public online archive such as PubMed Central.
This ensures, as best as anyone can, that the above three conditions continue to apply to the Open Access content in perpetuity.

It’s been my contention that in the absence of explicit, conspicous and machine-readable Open licensing, condition 3 is violated because in this litigious age, the conscientious and the risk-averse will not download and derive without explicit permission. I got “explicit and conspicious” from Peter Suber:

The newer definitions [of OA] recognize one further element: an explicit and conspicuous label that an open-access work is open access. Readers should be told when a work is free of price and permission barriers. They might be reading a copy forwarded from a friend and not know whether the publisher would like to charge for access. They might want to forward a copy to a friend and not know whether this kind of redistribution is permitted. When an article has no label, then conscientious users will seek permission for any copying that exceeds fair use. But this kind of delay and detour, with non-use as the consequence of a non-answer, are just the kinds of obstacles that open access seeks to eliminate. A good label will save users time and grief, prevent conscientious users from erring on the side of non-use, and eliminate a frustration that might nudge conscientious users into becoming less conscientious.

and “machine-readable” from Peter Murray-Rust:

For me, if my robots cannot read the articles then as a human I have no interest at all in reading the “fulltext”.

Peter MR is not saying that free access for humans is useless, but that to realize the full potential of text- and data-mining, OA materials need to be machine-readable, which includes letting the machines know what they are allowed to have.
PMR: The important development is that there is unity in this view, it is clear and straightforward to promote. There are no fuzzy edges. Only Open (BBB) (= Berlin, Budapest, Bethesda) is Open.
I must confess that finding my thoughts echoed by such leading OA proponents makes me feel better about being, on this issue, at odds with Stevan Harnad. I simply cannot agree that Open “comes with” Free, and the distinction bothers me. It should be relatively easy to convert Free to Open — simply add a Creative Commons or similar license — but I think it would be better to do that proactively. If we gloss over the difference between Free and Open at this relatively early stage of OA, we risk creating a (potentially enormous) body of Free text that must be updated to include complete, useful permissions when at last we realize that Free Is Not Open. (The game’s afoot: / Follow your robots, and upon this license / Cry “Free is not Open”!)

PMR: If you have read this far, thanks. Hopefully an increasing number of you won’t have needed to read the detail as you know it by heart. So the formal positions are clear. The problems are with labels and strategy. I hope the following is clear and correct:

  • The full texts of BBB logically require Openness – see para 3 above.
  • Any publication labelled “Gold” Open Access should be consistent with this. (I am not sure this is true for all Gold publishers/publications).
  • Certain uses of the term “Open Access” (many under the term “Green” Open Access) do not explicitly require adherence to para 3 above. And many implementers (publishers) provide licenses under the label “Open Access” which explicitly forbid actions under 3.
  • Some proponents of OA (esp. Stevan Harnad) argue that an publication labelled “Open Access” implicitly carries the intention of para 3.
  • Many readers are unable to accept this and are deterred from re-using these Open Access publications for fear of breaking laws and license agreements.
  • The Greenness, Goldness or Openness of a publisher or publication is often not trivial to determine. It requires careful reading of the license, possibly queries to the publisher (often unanswered) and may even depend on institution-dependent licenses. The “Open Choice” and other options offered by some publishers (pay for Open) do not always offer full OpenBBB – e.g. the publisher still retains copyright. Many “Green” publishers are not proactive in clarifying the situation and – given the shining examples of clarity above – may be accused of at best irresponsible laziness and at worst deliberately obfuscation (usually by inaction).

I hope all of us can agree on this analysis (whether or not we agree on the strategy) – if you don’t, please offer corrections. The questions are what should we do about it and how should we do it.
Stevan’s strategy is that we should devote our energies to achieving 100% Green Open Access and only then turn to achieving full Openness. (Obviously any Gold achievements are a benefit and may, but need not, be increased by Greenness). Stevan is clear and consistent in arguing that this should be the main current focus of the OA movement.
Bill and I and many other practising scientists (e.g. the founders of PLoS) cannot wait for full Greenness to occur. The need for “Open Data” is now rapidly growing can be fought alongside Open Access. It is critical to have clarity in access to and re-use of scientific data, and much of this (although not enough!) is contained in primary scientific publications. This is an area that must be fixed on a shorter time scale than Open Access has taken (that’s not a criticism – it’s a compliment to the staying power of Stevan Harnad and Peter Suber ans many others).
What is clear is that “Open Access” as a term is too vague an instrument for measuring the current position and providing effective instruments for change. By contrast “Open Source” is. We now need the following:

  • an effective terminology. The concepts are clear. I like FINO though it’s not quite right as a label for a publisher or publication. “PLoS is a FINO publisher” doesn’t quite make it. Since Openness absolutely implies Free we don’t need both. “FullBBB” is accurate, but is not as catchy. “Molecules is a FBNO” publisher (FreeButNotOpen). It’s pronounceable. I need something where I can look up the status of a work instantly.
  • The publications and the supplemental data themselves must explicitly carry the license and or  a statement or Openness.
  • We must get an accurate survey of the field. For example the DOAJ does not include re-usability information. For example Molbank/Molecules is listed as Open but carries the phrase: “Copyright Arrangements
    If no alternate arrangement concerning copyright has been made with the Publisher: When you submit a paper and the paper is received, your paper will be asigned an unique manuscript ID and you will be asked to transfer copyright. […] This is a prerequisite for publication of your paper in this journal. The manuscript ID should be provided for copyright transfer form. [PMR: As you have seen from earlier postings here reporting correspondence  with Molecules this is not an oversight but a policy.]

We need to start doing this now. In parallel we have to keep advocating that:

  • authors explicitly add creative commons (or science commons) licenses to their manuscripts and supplemental data
  • frustrated re-users contact the publisher and ask for clarification about re-use. If this is an oversight and the publisher makes this clear that’s great. If the publisher is deliberately restricting the re-use of scientific data then ask them to make that publicly clear. If they don’t, publicly advertise that the publisher is being uncooperative in advancing science.
  • and on the positive side show the community how valuable it is to re-use data. This isn’t easy when you are forbidden to use it. But that is the sort of things we are doing with SPECTRa and CrystalEye – we are creating our own dogfood as well as eating it.

The differences in strategy are not serious problems for the Open Access movement. Any large movement that has taken years to create against ignorance and then resistance backed by money will have them. Open Source has had very strong differences and yet it’s flourished and is a major area for innovation and wealth creation. Open Access is doing the same. And any publisher who wishes to denigrate the Open Access movement by hiring pitbulls immediately advertises their narrowmindness and fear of the future.
When the pitbulls start applying the same tactics to Open Data then we shall know we have succeeded in arriving.

Posted in open issues | Leave a comment

Making a donkey from a hamburger (XHTML)

Peter Sefton and we are collaborating on tools to create XHTML and other markup languages in a simple environment. But surely there are tools already to manage HTML – it’s ca 15 years old…? Yes, there are tools – and here’s Peter’s account of how he fared with the Word version – typical quote (this is into stage 27!!):
PeterSefton:

You see that? Word has helpfully put on margin-left:-180.0pt. Hmm a left margin of minus one hundred and eighty points
Give up in disgust. I can’t see a way to get Word 2007 to make a blockquote.
(And I tried a couple of other things too, like guessing that if I used a style like HTML Blockquote Word might magically Do (Nearly) The Right Thing the way it does with HTML Preformatted style. It doesn’t it makes a paragraph with class HTMLBlockquot but with the wrong CSS. Oh well.) Actually I can’t give up because I have yet to play with the lists.
There’s a bit in my paper where I have a blockquote with a list embedded in it. That’s perfectly possible in ICE, but would be very hard in Word. So lets look for an easier case.
and on it goes – finally:
This sux.
The worst bit was when I managed to get a word document that contained a blockquote that is invisible through the editing interface, but which creates nightmares like invisible paragraphs with their left margin miles off the screen.
If you gave Word 2003 to somebody and asked them to write a paper that could be given to a fussy HTML publisher and also printed with nice headers and footers, or saved to PDF then they’d be stuck.
Which I kind of knew, which is why we invented ICE. But I needed to go through this so I can show the results for the paper I’m writing.
Next up, OpenOffice.org Writer. What do you think OOo fans? Will it do any better? An how about Google Docs?

I wouldn’t bet on it – but appalling though Word is it’s no worse than WordPress (in which this blog is written). And this tool is specifically to write HTML. I know how to write HTML – I’ve done it for 15 years. It’s fairly simple, and I use a simple subset. So it should be easy to create a blog post. WordPress even gives me a way to author natively in HTML.
But here are some of its fun things:

  • try adding code (I can’t use the HTML tags as illustration so they will have to be imagined here). You can type it literally, but wherever you put lines breaks (with or without “pre” tags) it will usually expand across the whole page.
  • try rendering XML. It will often be visible OK. Then when you save it disappears. Maybe the code is still there. No, all the lines have been replaced by a “br” tag. Everything is lost
  • add a style for a paragraph such as Italic. Then the whole of the rest of the post is made italic. You can’t get rid of it by unhighlighting because every sentence contains empty “em” tags. Hundreds of them. You can’t take them out by hand. The post is ruined.
  • Blockquotes may be present but simply don’t display. This cause terrible problems when trying to separate my text from the quoted stuff. And sometimes the unindentation only happens at the final save.
  • And the worst problem is cut and paste. This is simply full of landmines waiting to explode. Yet why should be have to retype from another blog?

And many of these problems have knockon effects. When the post is aggregated by Planets (e.g PlanetBlueObelisk) all the posts after it get contaminated with the infecting styles and I get irritable messages from other BO people. (Actually they are very tolerant).
So this is making a monkey or donkey from a hamburger. Why do we have such awful tools? OK, it’s free and Open and it does things very well apart from the text editing.
It’s not difficult to save a simple HTMLDOM. HTMLTidy will keep the tags clean. So please, WordPress, can we have a “what you get is what you want”?
[Note added afterwards. I really thought I had got this right. But when it appeared in final form WordPress has inserted a large number of empty “a” elements in the text (I daren’t type the HTML in this blog. The message is clear – you cannot cut and paste from other blogs – or anywhere else. Or if you do you have to save it in a text editor.]

Posted in Uncategorized | 1 Comment

Chemistry in MathML and CML – comments?

[warning – WordPress is not very math/chem friendly so forgive formatting]
Michael Kohlhase and I are trying to come up with a synthesis of MathML and CML for representing the numerical aspects fo chemistry. By chance we have started with reaction rates – mainly because I found a thesis which is well suited for markup. It contained the equation:
rgas = k0·[Ester]+kKat·[Ester·Kat]
(actually it contained it in PDF which didn’t transcribe but this is the essence.) So how do we encode it in MathML and CML.
At one level – presentational – its quite easy. MathML has symbols for all the symbols above and you simply pick them. They will allow pretty typesetting (which is important). The problem is that they don’t mean anything. What does “+” mean? it’s obvious to a chemist – we add two quantities. But to a mathematician it can mean lots of things. And, now you think of it, it also means several things to a chemist – such as a positive charge. Well it obviously doesn’t mean that here, does it? Could it be a positively charged Ester? Not really beacuse it’s not a superscript and because Esters aren’t usually charged and because additional make more sense. But these are chemical judgments. Chemists make them easily. Mathematicians might not.
Then there is the “·” – not a period/fullstop, but “middot” a midheight dot. What does it mean? Well it’s obvious to a mathematician that it could mean multiply. So we have three multiplications and we could use the MathML “times” construct. But hang on – Ester times kat doesn’t make chemical sense. Here is means “reaction complex” of Ester and Kat (I told you the thesis was in German and this is an abbreviation for Katalyst – catalyst in English). So the symbols by themselves are meaningless to a non-domain expert. And, unfortunately, our chemical journal-eating robots are not yet very expert in equations.
Take a minute to think about how your would explain the complete chemistry in this equation to a mathematician and how you would explain the complete mathematics to a chemist.
You’ve probably come up with something reasonable. But now try to explain it to a machine. That’s what we have to do in Content-MathML and CML.
Michael has come up with the following semantic maths expression (I hope WordPress preserves it)
(no it didn’t)
Try again…
<math class=”display”>
<apply>
<eq/>
<csymbol cd=”foundations” name=”rgas”/>
<apply>
<plus/>
<apply>
<times/>
<csymbol cd=”constants” name=”O”/>
<apply xml:id=”esterconst”>
<csymbol cd=”foundations” name=”squarebrackets”/>
<csymbol cd=”cml” name=”Ester”/>
</apply>
</apply>
<apply>
<times/>
<csymbol cd=”rateconstants” name=”Kat”/>
<apply>
<csymbol cd=”foundations” name=”squarebrackets”/>
<csymbol cd=”cml” name=”Ester”/>
<csymbol cd;”cml” name=”middot”/>
<csymbol cd=”cml” name=”Katstar”/>
</compound>
</apply>
</apply>
</apply>
</apply>
</math>
(this is about as pretty as it gets>
So this has captured the semantic of the maths, but none of the chemistry. It states (roughly) that you multiply something by something and add it to something times something.
The “cd” are OM content dictionaries – you can look up the meaning (and the semantics) of the object in a dictionary. So we could find out what rgas means in the foundations dictionary. Of course we still have to write the dictionary entry – and that isn’t easy – it’s the sort of thing that Andrew Walkingshaw has been developing Golem to help with. But we make progress.
The content MathML is a big advance – a machine could evaluate the expression if it know what the somethings were. That’s where chemistry comes in. And, be warned, if you want a machine to evaluate the chemistry in the above equation it may be harder than it looks. To start you off, here is Wikipedia’s version of the rate equation (and if you don’t agree, please update WP, that’s how it works)…

Formal definition of reaction rate

According to IUPAC‘s Gold Book definition[1] the reaction rate v (also r or R) for the general chemical reaction aA + bB → pP + qQ, occurring in a closed system under constant-volume conditions, without a build-up of reaction intermediates, is defined as:ccv

v = - \frac{1}{a} \frac{d[A]}{dt} = - \frac{1}{b} \frac{d[B]}{dt} = \frac{1}{p} \frac{d[P]}{dt} = \frac{1}{q} \frac{d[Q]}{dt}

The IUPAC[1] recommends that the unit of time should always be the second. In such a case the rate of reaction differs from the rate of increase of concentration of a product P by a constant factor (the reciprocal of its stoichiometric number) and for a reactant A by minus the reciprocal of the stoichiometric number. Reaction rate usually has the units of mol dm-3 s-1. It is important to bear in mind that the previous definition is only valid for a single reaction, in a closed system of constant volume.

First-order reactions

A first-order reaction depends on the concentration of only one reactant (a unimolecular reaction). Other reactants can be present, but each will be zero-order. The rate law for a first-order reaction is

\ r  = k[A]

k is the first order rate constant that has units of 1/time
If, and only if, this first-order reaction 1) occurs in a closed system, 2) there is no net build-up of intermediates and 3) there are no other reactions occurring, it can be shown by solving a mass balance for the system that

-\frac{1}{a}\frac{d[A]}{dt} = k[A]

where a is the stoichiometric coefficient of the species A.
The integrated first-order rate law is

\ \ln{[A]} = -akt + \ln{[A]_0}

That’s enough for me to post at present. Have you thought of everything? (I personally forgot the multiplier “a” in the last equation – it’s easy to do).

Posted in chemistry, mkm2007, programming for scientists, XML | 2 Comments