Monthly Archives: October 2011

My bike is (fairly) stable

An interlude. PNNL guest house provides its guest with low cost rented bicycles. I had a "free" day today (though I did some thinking) and cycled about 5+5miles through Richland to Bateman Island. But first, here is my bike:

Perceptive cyclists will notice that it has no brakes on the handlebars – in fact it has a backpedal brake. I haven't ridden one of these before and didn't find it very easy – if you backpedal then you stop, but if you put your foot down you keep going. I'm not sure what you do on steep hills. I think these bikes are popular in NL which doesn't have many hills. I couldn't go very fast, partly because the bike had only one gear, partly because it's quite heavy and partly because of the air intake.

Zooming in we see:

All the bikes are named after elements. I got Technetium ( ). The trouble is that all technetium isotopes are unstable/radioactive. The one I know (and is used in medicine) is technetium-99m with a half life of 6 hours. This means that after 6 hours half my bike would have disintegrated (actually it depends whether the bike represents a single atom – if so, then there was an evens chance that after 6 hours I would have no bike). I was even more worried about Tc-98 because this probably only lasts for milliseconds.

I needn't have worried. The thoughtful PNNL bike people had chosen the isotope with a half-life of 4.2 million years. I had more chance of a car crash than spontaneous disintegration.

A beautiful day and Bateman island ( )was great (and isn't Openstreetmap fantastic – it shows the cycleways unlike most other maps)– lots of birds on the river (which is quite wide here). I saw white pelicans, various grebes, ducks etc. which I'll try to look up from memory:

And the causeway to the island

Update: Open Science conclusion; and PNNL NWChem/CML/Quixote update

#oss2011 @okfn


After my talk at OSS I published two posts on the value of Open Access – I used challenging language which has upset several people but seems to find a chord with others. The discussion has taken place on the Open Knowledge Foundation discussion list ( and about 25 more posts, culminating currently in a very long and researched post ( ) by Jenny Molloy, my co-presenter at OSS. Further discussion can take place on this list – it's open to everyone.

For the next few days I am now devoting my energies to helping create the first fully Open Computational chemistry system. This is based on:

  • NWChem which last year became fully Open Source. It's the main Open program for atomistic calculations, and is complemented by Open plane-wave codes such as ABINIT, MPQC and Quantum Espresso (Please comment if I have missed any – I am also not aware of a list of Open Source computational chemistry – not the same as cheminformatics).
  • Chemical Markup Language and specializations in conventions such as CMLComp and compchem.
  • The JumboConverters framework
  • The Blue Obelisk (includes cclib, openbabel, Avogadro, Jmol, etc.) and other Open Source chemistry tools.
  • Chempound (a repository for semantic chemistry) built by Sam Adams.
  • The Quixote community
  • The FoX library for XML and CML in Fortran

I have blogged about most of these before. At present what we are doing is:

  • Define a top level dictionary for compchem. Bert deJong at PNNL is optimistic that this is feasible in a reasonable time. It will be a community effort.
  • Define a revised convention for compchem (compchem1, say). Bert thinks there is a very clear infrastructure to almost all QC codes and that we can implement this.
  • Add CML output to NWChem. We are halfway there. I have compile FoX on windows and we are currently getting NWChem running on my machine

This will be supported by dictionary validation and document validation.

Anyone interested should post a comment or mail me or the Quixote list – see main page above


Suboptimal/missing Open Licences by Wiley and Royal Society


Well Wiley has just proudly announced its first Open Access Journals They're not cheap for author-side fees (Brain and Behaviour == 2500 USD – higher than the others – presumably it's easier to tap brain researchers for money).

What has upset me is that the licence is CC-NC. No commercial use.

Now I'll be very generous and assume that Wiley isn't aware of the real problems of CC-NC. If they aren't they should read my blog post:

which also points to definitive sources.

CC-NC is apparently attractive, but actually completely restrictive for anything I want to do.

  • The material cannot be used for teaching as that can be construed as commercial (especially in private universities)
  • It cannot be put on web-pages which carry adverts
  • It cannot be used for text- or data-mining which is openly published because a commercial company might read my paper or website and use it
  • All derivative works must carry CC-NC
  • And worst of all it violates the Budapest Open Access Declaration (and the Open Definition)

I doubt VERY much whether it is the intention of the AUTHORS to forbid commercial use of their material. Effectively they would be saying

"I don't want a manufacturer of medical equipment to use any pictures from Brain and Behaviour without paying WILEY money" (remember dear reader that the AUTHOR gets nothing."

So, Wiley, I am in a good mood and assume this was a mistake. It would be very nice if you were able to respond to this post (you WILL read it, I know).

There's a similar case at the Royal Society. Now they already publish Open Biology under CC-BY 3.0 so they know about licences. They've recently made all their historical content FREE, which is absolutely stunning ( ), but there is no explicit licence. I have also heard that there are actually still paywalls in place for this material.

Please, Royal Society, tell us you simply forgot to add CC-BY on the splash pages and the articles. Because then we can use them for teaching, etc. with a clear legal conscience.

And we can then do some exciting things with the Bibliography!



Occupied Scholarly Territory: Which publishers do I trust?


For me the primary concern in scholarly publishing is who do I – and maybe you – trust? This blog will give some personal thoughts and probably upset some, but it shows my thoughts.

If I am getting windows renewed for the house I need to know which builders I can trust. That's as important as cost. Who has my interests at heart when I pay them for materials and labour? It's not a silly idea – and in a small city like Cambridge there are many ways to address it – friends and neighbours who have had work done – reports (good and bad) on the Cambridge blogosphere – visiting showrooms and premises, etc.

And almost always talking to the people involved.

And generally it works. When large commercial companies are involved the personal trust is lacking but it's still possible to read consumer magazines or the grumblepages of the newspapers. Generally you know what is available with some idea of who the cowboys are (a UK term which is not flattering!). And local tradespeople often have the interest of the community as well – they live there!

But in scholarly publishing it's different. Who can you trust to look after your interests? Either as author, or reader, or institution, or the wider society?

Answer: There are almost no scholarly publishers you can trust. Certainly not when measured by the volume of publications.

The only publishers I trust are those where I know the people involved, talk with them, and we know each other's desires and limitations. Here are some I do trust:

  • The International Union of Crystallography. They have a society-based ethic, are innovative, have been part of my life for 45 years. I know the editors and the IUCr boards and committees. They are my ideal, followed by:
  • The European Geosciences Union (publishing through Copernicus). They are aggressively Open Access because they are part of the community and have the community interests at heart.
  • Public Library of Science PLoS. Because it was set up by passionate scientists, who wanted to change the world of scholarly publishing. My trust remains as long as the scientists such as Jonathan Eisen are in control.
  • ASBMB – a society publishing biology and molecular biology. I know the editor Ralph Bradshaw well and we have talked long about the aspirations of the journal for Open Data – the need to back the science with data. He insisted on that for Molecular and Cellular Proteomics (MCP) and the rest of the publishing community sneered. Now they have adopted the principles pushed forward by Ralph. MCP isn't OA, but I trust it. As long as Ralph is in charge.

I trust these because I trust the people. Other people I currently trust are the immediate editors in Biomed Central who have done a great job in promoting Open Access and Open Data.

But BMC are owned by Springer and I totally distrust Springer as an organization to look after my interest, my university's interests, and my readers' interests. I may be slightly romantic but I come from a background where companies were ethical and wished to provide a fair product or service for those whose money they paid. It used to be called pride.

But read Richard Poynder's interview with Springer's boss . Haank speaking:

"The Big Deal is the best invention since sliced bread. I agree that there was once a serial pricing problem; I have never denied there was a problem. But it was the Big Deal that solved it.

"The truth is that it is in the interests of everyone—publishers and librarians—to keep the Big Deal going."

I find no mention of "reader" (the enduser of a publisher is the purchasing officer of the university – often the Library)

I find no mention of "author" (other than "author charges", "author archiving")

I find no mention of "the scientific community"

The whole article is cold-hearted. About how Springer has designed a product not on its value to the community which is paying for it, but as something artificial that can be manufactured as cheaply as possible and sold at the highest price. It doesn't matter to Haank whether it helps science – it's just a commodity. And absolutely no indication of innovation based on what the community wants – oh, no – it's innovations that Springer thinks it can sell. Like the 35 USD per day rental of papers.

So, sadly, I do not trust BMC long term and it saddens me to say so.

The other commercial publishers (almost all closed access) are all the same. I don't trust any.

And what about Societies? I used to help run the Molecular Graphics Society as treasurer. We didn't use publications to subside the society – we used the society to subsidize subscription costs for members. (Shut up, PMR, you are a stupid romantic – we are in the C21 and sentimentality is a thing of the past).

Most of the societies have lost their soul and sold out in one way or another. The American Chemical Society's anticontributions are well-known. The Royal Society of Chemistry stated that "Open Access is ethically flawed". OK, 5 years ago – but how can a society say that at all? Many learned societies , especially large ones, are run for the benefit of their senior officers and the bottom line.

Which is a tragedy, because it is the learned societies and international unions who should be the guardians of scholarship. Not profit-oriented business people, whether commercial or not. I'd love to recover their role – I wish I knew how.

And until that happens we are left with a very few organisations we can trust. A few charities (e.g. Wellcome Trust) and a few (not all) funding bodies.

Oh, and if you think that all commercial OA publishers can be trusted, read Richard Poynder on InTech ( ). Oh for the lost learned societies. Quis custodiet? No one except you and me… We'll have to do it through the blogosphere.

Because, yes, I can trust the bits of the blogosphere I have learned to trust.

Yes, today seems to be a gloomy start.



PNNL and eResearch: Semantic Physical Science

[the purpose of this mail is to work out my thoughts, test that I can blog from PNNL, let people know I am still alive, and tell the world what I am doing and will do.]

I'm spending 9 days here at PNNL (in Richland, WA, US) with little to distract me so I have a real chance to get my ideas in order about semantic physical science. There's a natural progression:

  • Create V0.9 of a high-quality computational chemistry dictionary (or ontology if you like the word). It's expressed as XML (Chemical Markup Language) but it's also isomorphic with simple RDF triples. We've done the first pass (have a V0.1) and I'm working with the group here to create the next versions
  • Then travel to eResearch at Melbourne where I'm collaborating with Nico Adams, one of my colleagues in Cambridge, who has moved to CSIRO, Clayton. Nico not only buys into the idea of semantic science, he's pushed it much further than I could have. With Alex Wade we are running a one day workshop in eResearch ( , "Making the Semantic Web work for Physical Science". I'm getting my ideas together now, and there will be a concentration on things like chemistry, quantities and units of measurement. If you know what the boiling point of water is, then you will be qualified for the workshop.
  • Later in Feb I will be spending some months with Nico. CSIRO is a great place to really develop an infrastructure. National labs (like PNNL, CSIRO and STFC – a international ones like EBI, NCBI) understand the need for proper data management, infrastructure and information engineering. Academia generally doesn't, and when it does it doesn't value it.

More later as I get the order worked out.

[Immediate update. I can blog from PNNL Visitor LAN!]

We are living in Occupied Scholarly Territory


[This is a short post as I am testing whether I can post from my guest room (I probably can't blog from the main lab)]. I shall explore this theme, probably getting even more angrier that I am.

We have ceded the homeland of Scholarly Publishing to the commercial closed access publishers. For me the only true goal is that we regain the ability to control our scholarship – authoring, publishing, reading, re-use. I don't see many people actively formulating this goal and doing something about it. I don't think many people, even in the OA community, actually care about this. I haven't formulated it well, but that's because there has been a 10-year vacuum of thought and action.

There are two intermediate positions: Green, which cedes the moral right of publication to the publishers and negotiates scrappy deals on the least profitable land. "You can grow hay on this plot as long as you continue to let us exploit the best land. You can only do this during these months (because we say so) and if you are too successful we'll find another way to stop you". Green OA is appeasement. It has no political force and is entirely dependent on the whim of the publisher. For me NO OA mandates should even think of green. (Hybrid is even worse, we pay the publishers twice to remain under their control).

Gold, which says nothing about the means of production. It gives the readers rights, and these are sufficient for readers if full CC-BY is applied. (It makes no concession to the innovation of the web.) It gives the authors no rights, other than to make their work available to the world. It does not allow them freedom of expression or freedom of innovation in the publishing process. That's not to say it isn't useful in the interim but the publishers are still occupying our homeland. Some publishers do understand this and are moving, but the OA offerings from major (closed) access publishers still treat authors as second class (or worse).

What we need for OA is a clear political manifesto (we don't have one) and clear courses of action.

Where is the Open Access Salt March?

Where are the Open Access busses?

Where are the Open Access Suffragettes?

Where are the people who have gone to court and possibly to jail for their beliefs? Mumbly platitudes (such as the lamentable Florida State university cop-out) don't change the world.

On odd days of the week (this seems to be one) I despair. On even days I think we are winning.

Open Science Summit Summary


I have a brief window in SFO – I love airports with free wifi

The OSS was mindblowing. The advances since last year are spectacular. Simply put: OSS has arrived and will – I am sure – be mainstream for years to come until everything is Open.

The primary message I took away is :

"Science needs democratization and we now have the tools and vision to make it happen"

The point of "garage PCR" is not that it's a fun hobby – it shows that science belongs to everyone. The advances in sequences were so tremendous that we can see everyone determining their own genomic information in their own home.

And they must control it. The technology is coming to the highstreet, so let's make sure it's OURS, not theirs. As David Rowan (WIRED) said two weeks ago at the Serpentine:

"If something is free, the YOU are the product"

Stick that on your bathroom mirror. It's critical to remember that we must continuously fight for our democracy.

I was blown away by Biocurious – a warehouse lab that Joseph Jackson and others have set up. In it a FOUR-YEAR old is able to work with the Green Fluorescent Protein. (Yes they are very aware of safety regs – they have autoclaves, etc.). I'd love to have a Biocurious in Cambridge and I have suggested they should look into cloning it, though country and state regs are the main complication.

The Open Science movement is coming together , just like the Open Knowledge movement has and the Open Source and Open Access ones. There are so many economies of scale by pooling resources and meeting other people doing the same and different things. (Last night we went to a hackerspace in SF where they had things like a sewing area, a mushroom growing area, a woodworking area.

And a Scanning Electron Microscope.

If you get the culture right then almost nothing is impossible.

Science is for all of us, not just academia. Academia has recently made a very bad job of doing science on behalf of the community. Open Science shows that the vision is much larger.

After all public libraries are not just for academics.

So why shouldn't we have public science labs?

No reason – and we will. If this catches anyone who'd like to help then just mail Joseph Jackson or contact the OSS page

Open Access saves lives


Yesterday I made the assertion:

"Closed access means people die"

I have no doubt this is absolutely true. I put it in the negative form because I want to drive home the inequity of walling information that, if released, could save people's lives. In #oss2011 we've now had clear proof of this.

We've had a wonderful survey of the involvement of patients in their own diseases. I'll exemplify this with Lorenzo Albanello, a scientist who suffered from a visual aura. His doctors all said it was migraine, but it wasn't – he had main brain scans / MRI and it was due to a vascular anomaly. He visited specialist who gave him contradictory and varied advice.

So he went to the web and created where he published his brain. You can see the anomaly

And he asked the world for advice:

What's the best decision to make?

Hi there,
my name is Lorenzo, I'm a biotechnology researcher, I am 27 and I'm writing from Italy. Since I was a child I have experienced the occurrence of generalized seizures, however such phenomena were rather sporadic and resembling normal faintings, so I haven't investigated the problem for long time. When I was 22 I went to see a neurologist after a new epileptic fit and she prescribed an MRI exam. So did I, discovering an arteriovenous malformation in the left frontal area of my brain. My reaction was rather careless about it, so I didn't take care of the AVM nor did I want to take an anti-epileptic medication. I just continued my normal life without any trouble.

He also posted the six options he was given (some of which had substantial risks of death (15%)).

Happy to say he picked one that worked and he is fine. I salute his brave venture. But he also said

"because I am a scientist I was able to read the [closed access] medical literature"

If he had not been able to do this he might well be dead.

Access to the medical literature saves lives.

As patients said "we don't care about privacy, we want to be cured"

Let's try to make access to all scientific literature a human right.


Open Research Reports: What Jenny and I said (and why I am angry)


Jenny Molloy and I have been representing the Open Knowledge Foundation at the Open Science Summit and we presented the Open Research Reports (ORR) project. The slides we used are at I expect that at some stage we'll be on the video record (last year's was very useful and also there was a transcript!). Because what we say affects the understanding of the slides.

The slides came from several sources:

  • The presentation by David Shotton and Tanya Gray at Science Online this September in London. ORR arose from ideas from David and others of us who met at "Beyond the PDF" where the idea of ORR emerged (idea don't belong to people, they choose people). The SoLo presentation gave lots of detail on the Semantics, which wouldn't have fitted into a 13-minute slot, but the WHY slides and some of the WHAT and HOW were included.
  • Jenny's overview of the London discussion and our further groundwork with JISC, OKF and SWAT4LS. See
  • My thoughts on the WHY of Open Research Reports (many expressed in animalophoto-comics)
  • Jenny's overview of what we're going to do in ORR and particularly the Hackathon in December
  • Results of the Open Bibliography/Citations projects

The slides themselves tell only part of the story – what follows is my thoughts alone and (probably) what I said. I was somewhat provocative and any flak should be directed at me, not Jenny, David or Tanya.


Open Knowledge saves lives

ORR is A community project to make disease data Open

We started with the (obvious) truth that information is a key component of health-care. That it's critical for the poorest countries in the world. So isn't it already catered for by the HINARI program which "was set up by the World Health Organization and major publishers to enable developing countries to access collections of biomedical and health literature."

So the publishers make their electronic material freely available (presumably gratis not libre) …

The country lists are based on Gross National Income (GNI) per capita (World Bank figures). Institutions in countries with GNI per capita below $1600 are eligible for free access. Institutions in countries with GNI per capita between $1601-$4700 pay a fee of $1000 per year/institution

So isn't this very commendable of the publishers to give their material freely to those most deserving? And when the countries become rich , they can pay. Well, this year Bangladesh became a richer country and the HINARI journals were cut off. There was outrage, reported by the Lancet (itself an Elsevier journal and closed access so presumably the Bangladeshis couldn't even read the outrage). Read - which appears to be gratis. And the LANCET argued that the HINARI should be re-extended to Bangladesh.

But I think that's completely wrong. The HINARI program only exists because the publications are CLOSED. It costs nothing to make the journals available. It costs more technically to prevent people reading the literature than to make it available. Libre material gets copied at zero cost. HINARI is nothing more than the crumbs of charity that the kinds used to give out. HINARI perpetuates a morally unacceptable system. The publishers aren't giving their content free, they are giving OUR content free (or rather restricting access to our content).

Simply, closed access publishers make money by restricting access to information.

That's been a consistent theme through the discussion

Now we all agree, I think, that more and better information leads to better medicine, better health-care, better environment.


    The worse the medicine and healthcare, etc. the more people die.

Nothing controversial so far? But these are the premises of a syllogism, and when followed through you end up with the conclusion:

    Closed access means people die

I don't think anyone can deny the truth of that conclusion. If a doctor, a patient, a planner, an engineer, cannot read the appropriate literature then they make suboptimal decisions. And that means people die.

So the balance is:

If we want a closed access publishing system then we have to accept that the price is people's lives.

Well, isn't that how the world just is? Engineering has fatalities, Transport has fatalities, leisure sports have fatalities, so why not scholarly publishing?

Because it's completely avoidable. The more I write about Openness the more angry I get about the immorality of closed access and walled gardens. And even more angry about the lobbying, the politics that tries to close down open efforts. We heard today (not from me) about how the American Chemical Society had spent money and lobbied to have Pubchem (the repository of Open chemical structure information) shut down. So my language is now less nuanced

    Closed access means people die

And that's not just me. In ORR we are having major contributions from Graham Steel and Gilles Frydman – patient champions for CJD and Cancer. Gilles told me of hundreds of people who die if their physicians don't know about the latest literature. Remember these physicians cannot read the literature (there's a blithe and stupid assumption that because they are professionals they don't have to pay for the medical literature – the papers WE scientists give the publishers in return for our h-indexes). So misdiagnosis is common and avoidable by access to the literature. (And don't dare try to tell Gilles he and his 65,000 community are not qualified to make this assessment).

So then we moved to WHAT can we do and how can we do it. The basic idea is to take the Open material – primarily libre material in (UK)Pubmedcentral – and collate it into annotated quality reports, one per disease. Collect the Open papers, and rank them by citation (yes I know that's imperfect, but we aren't trying to advance someone's career, we are trying to save lives). Then we get the community to annotate them.

What? Get unqualified people to extract information from the papers? That's junk.

No. here's the sort of material that David has listed for an infective disease paper:

I am sure that no one reading this would be unable to extract SOME of this information reliably. You don't have to be a medic to understand lat/long or dates or species. I personally would be able to extract all the drugs (and even our software can). So each person does what they can do well.

And as a result the community edits the report.

And we have a high-quality tool.

Hang about! You've omitted all the closed access stuff. That's 90% of the literature.

So? For many purposes 10% is completely sufficient. For introductory material, for teaching, for a text-mining corpus, for diagrams, the list is endless.

And the quality of the annotation and extraction can be used for data-mining, mashups, all sorts of semantic stuff. Making it much more useful than the same amount of stuff in the closed literature.

And mightn't this just jerk the consciences of some people? And continue to tip the balance to Open.

So join us for the Hackathon in London, UK on 2011-12-06/07. Because it's a hackathon we don't know in detail what we'll do. But we'll make a major start to establishing Open Research Reports.

And of course YOU can take part. The literature must be all Open, so everyone can read it. All tools will be Open. It'll be great fun. Closed access publishers especially welcome as it will help them to adjust to the inevitable change taking place.

Open Science Summit


Jenny Molloy and I are representing the Open Knowledge Foundation at Open Science Summit in the Computer History Museum, Mountain View, Silicon valley(haven't had time to look at ANY of the museum!). It's a fantastic meeting, run by Joseph Jackson with huge dynamism and belief. Open Science is a forum bringing together a wide range of those interested in doing things Openly. See for the programme.

But last night there was a party in Biocurious (the open science company that Joseph started) in a warehouse in Sunnyvale. Great atmosphere. Core Californian – young, enthusiastic people – each with their startup. Total faith in their success. (They contrast this with the east coast where everyone is cautious).

Victoria Stodden kicked off with Reproducible Research – it's clear that a lot of people are committed to the importance of this – funders, (some) editors, researchers.

Then a session on Patents. Most interestingly a project involving people playing a game with real money and several different models of patents (conventional, conventional+"pantentleft", and completely open). The completely open approach (i.e. no patents) brought in the most money and was at least twice as efficient in time and cost.

Awesome presentation from Beijing Genomics Institute. Non-profit, with as much output as the whole of the US (I think I got this right).

Several things I already knew about, but no less interesting for that – Mendeley (which has fully opened its content – I need the URL for that). Mathoverflow. Digital Ocean I didn't know.

Jai Ranganathan promoting #crowdfunding. Great idea. (They raised 60K USD for a statue of Robocop – I thing average of 20 USD per donation). So they've moved to #scifund which is asking for science projects which can be exposed to the world for funding. You've clearly got to get your idea across rapidly – but, hey, that's what it's about. A great way of getting science into the community and getting the community driving science.

Alex Hodgson who I met in the Biocurious party runs a recommender company for antibodies. Many antibodies are crap (there's a sticker saying – "Say no to crap antibodies"). So it's like hotel reviews ("rude staff and bedbugs" translates to "complete crap" or "wasn't what was on the label"). IN this way we get a system of trust created for different suppliers. Would that it happened for chemicals! Alex's company has been invested by Digital Science.

There's a great buzz. No doubt that Open Science Summit is here to stay (several people elsewhere indicated to me that they would have loved to come or would be coming next year).

The next post will describe what Jenny and I presented.