petermr's blog

A Scientist and the Web


Archive for April, 2010

American army declares war on Microsoft PowerPoint

Thursday, April 29th, 2010

According to an article today the US Army has declared war on PowerPoint:

The problem is, apparently, this diagram. I reproduce it without permission but since it is presumably a work of the US government, albeit highly creative, it should be in the Public Domain.


Now I am opposed strongly to the current use of Powerpoint and include in my slides:

“Power corrupts; Powerpoint corrupts absolutely”.

I was proud when I had invented it and then found that Edward Tufte had already pre-empted me ( ). His reason is quite different from mine, though I also agree with him. His concern is that Powerpoint reduces the creative input in communication to a set of bullet points – it corrupts human thought and dignity.

My concern is that it corrupts information – it reduces semantic graphics (assuming they were) to non-semantic binary. However now that Powerpoint exposes semantically (in XML) I have less quarrel with it from that point of view but have grown to adopt Tufte’s concern even more strongly – the linear flow of information. A slide show can only be shown in one direction – my own approach is to select whatever visual is needed at any stage. It’s not easy – and it doesn’t save easily – but it allows instant reaction to the audience and their needs.

I actually have no issue with the fact that the army diagram is in Powerpoint. The question is whether the information is useful. If it is, then it’s highly complex. A graphic is far more useful than reams of dense text (I suspect the textual equivalent of this picture would be at least 100 pages). I cannot tell whether it is actually a useful analysis of the concepts – I have little faith in any military analysis benefitting the world. (Like 2 million others I demonstrated against our involvement in Iraq and Afghanistan and events have justified our view.). But if it’s a useful set of concepts and if it’s useful to the generals then I suspect the graphic is useful.

Here is a biochemical example (taken from without permission but with thanks)

It’s complicated because it represents a living system and living systems are complicated. We desperately need a non-reductionist approach to this – or to delegate or thinking to computers.

Teaching my computer chemistry

Thursday, April 29th, 2010

This is a test of dictating a block in a very noisy coffee room. I have just bought a cost effective headset with a noise canceling microphone and so far every word that I have dictated has been faithfully rendered by the system.

This is impressive. It means that in our Amy project that we can probably rely on reasonable fidelity for converting the language that chemists speak into partially semantic natural language. So far I had had to make to corrections: common homophones such as to and two or four and for caused problems. However the system can learn from corrections and I expect and that it will make relatively few errors if I speak clearly. I am now very confident that it will be possible to give our fume hood simple instructions or queries that it will understand.

It is possible to introduce chemical names into the text; the system does a good job of recognizing them. Here is a list. Benzene, toluene, acetone, ethyl acetate, caffeine, testosterone, penicillin, malonic acid. It will also do functional groups. ethyl, methyl, propyl ,butyl, pentyl.

I had to correct some of those, but now they should be in the dictionary. Let’s try. Methyl, methyl, propyl, butyl, pentyl. I had to correct those. Let’s try again. Methyl, methyl, propyl, butyl, pentyl. I still had to make some corrections and I am worried that it confuses ethyl with methyl.

However with practice I expect it to learn. Dimethyl and he leaned her (should be dimethylaninline). Methyl benzyl eight (should be methyl benzoate). NA benzoate to (sodium benzoate) can it recognise sodium! Dichloromethane seen (should be dichlorobenzene). Chloro benzene. It’s got that one right. Dichloromethane. It’s got that one right. Chloro bromide on the same (Chlorobromomethane).

But it will be fun teaching it.

Test of publishing Chem4Word to blog

Thursday, April 29th, 2010

I have left my microphone so this is being typed.

Egon Willighagen asked whether it was possible to post documents created with Chem4Word to a blog. Having mastered the process of posting to a blog without chemistry (thanks Sam) I’m now trying chemistry

This is (benzene)

And this is coronene

Let’s see if I can post this. The result will not be semantic, but Chem4Word allows for this as the chemistry is displayed as static images (PNG). So they won’t do anything but they are a good representation of the chemistry.




Examples of Scientific Semantic Web wanted

Thursday, April 29th, 2010

I am giving a talk on Friday where I want to show the power and wow! of the Semantic Web applied to Science through an online example. I’d be keen on a DBPedia example (along the lines of with “All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants”) but with a scientific content. But even this no longer works.

I have a general audience so I don’t want to talk through raw SPARQL (although I’ll hack it if necessary as long as there is an answer). I need something that can be demo’ed in a minute at most. In the medium term we will be able to hack this with molecules as we will be contributing Open RDF for molecules and structures.

The advantage of DBPedia (and to a lesser extent the whole LOD cloud) is that it has not been planned and I’d be grateful for examples that reflect this.

Any help will of course be acknowledged.

More on Chem4Word and OpenOffice

Thursday, April 29th, 2010

I have left my microphone so this is being typed.

I had expected – and am glad – that there would be debate on the release of Chem4Word under an Open Source licence. The latest contribution (, (Dr. Roy Schestowitz)) which I quote in full (till the ruler)

Who can port Chem4Word to

Summary: Chem4Word is an example of Free software which is trapped deep inside Microsoft’s proprietary cage and needs rescuing

From an academic and scientific point of view, Chem4Word’s developer does the right thing by becoming a Free software proponent and choosing the Apache licence for the project (not GPL, which would have been better). The only problem is that Chem4Word helps sell Microsoft Office, which means that any user of Chem4Word (even as Free software) will be pressured to buy a standards-hostile and closed-source office suite. Those who are close to this project are aware of the issue.

This is yet another example where Microsoft is using (as in exploiting) Free software to sell its proprietary software.

Supporting Microsoft software is bad for a variety of reasons, not just because it’s proprietary and standards-hostile. Here for example is a new explanation from Omar, who exemplifies what Microsoft is doing to developing countries where cost matters a lot.

But then the grief doesn’t end here, because the problem will seem even worse if you ponder the fact that most people, around the world, who use computers can barely afford to pay their monthly bills, and that all these people are using pirated software because:

* A) That’s the only software they’ve ever known.


* B) They cannot afford to pay for the annual licensing fee of a genuine copy.

These people have been mass-hypnotized, they’ve been indoctrinated into believing that whatever MS gives them is right, and that MS software is the only software on Earth that actually works. Now, take under consideration that MS is a for-profit organization after all (Actually, MS is a for-nothing-but-profit organization, but ya know), and that sooner or later, MS will start collecting money in all ways possible.

Let us hope that Chem4Word gets extended (or forked) to support Free software further down the stack. It can support all major platforms if it gets ported to office suites such as

“I would love to see all open source innovation happen on top of Windows.”

–Steve Ballmer, Microsoft CEO

I am not out of sympathy with much of this. I have made some of my position clear ( ) and now add some more thoughts. For those who don’t know me and my group some background.

· I am a passionate and public supporter of Openness – I am on the advisory board of the Open Knowledge Foundation ( ), a prime mover in the Panton Principles for Open Data ( and a founder of the Blue Obelisk Open software/data/standards ( ) group in chemistry. I have been outspoken in this area on many occasions and have criticised certain non-Open Access publishers and opponents or obstructers of the free redistribution of scholarly data.

· My group and employer receives support from Microsoft for Chem4Word (I personally do not). I have made it clear to Microsoft that I shall speak my mind during the project and do not feel shackled. I am doing so now.

· I have been critical of Microsoft in the past (e.g. at the time of the Halloween document). I have entered this sponsorship with my eyes open.

· We spent a great deal of time and care drawing up the contract with Microsoft and this is reflected in the Open Source offering that we have now – jointly – delivered.

I am not against commercial companies. I used to work for Glaxo (now GSK) and our institute is sponsored by Unilever. I have lived through the era where IBM dominated the software/hardware market, to be replaced by Microsoft. I have seen many empires rise and fall and I am optimistic that monopolies in this area have the seeds of their own decline. Monopolies are generally bad and I worry about Google as much as Microsoft. I believe that the rise of competition checks on and exposure of Microsoft actions mean that there is less (apparent) monopoly. If Microsoft really were a monopoly I would probably be more concerned – it may still largely have the desktop but it doesn’t have a monopoly on the browser or the Net content.

Most software is closed – ICT is an exception – a shining and great exception, but unusual. Open software requires some form of incentive – a mixture of time and money in the first instance and largely money for sustainability. I wish it were otherwise and if this project can generate models in chemistry for sustainable F/OSS software I will be delighted. Bioinformatics is an exception (I may write elsewhere on this) but there is a great deal of public Open funding. In chemistry the normality is that software is closed, usually sub-standard (with regard to modern software engineering techniques), diminished by needless competitive duplication. There has been virtually no innovation over the last decade (integration and widget frosting, but no new science). We wish to change this – to create an infrastructure where the community can actually do new things rather than waiting for last-century companies to make minor modifications. We are getting there.

An important part of Chem4Word was to design a new approach to chemical information – one appropriate to this century using open standards (XML, RDF, REST, etc.) That’s happened and it’s all in the Open – code, data, specifications, etc. That’s available to the community whether or not people use Chem4Word within a Word environment. And to give Microsoft at least some credit they were early adopters and promoters of XML and Word uses XML rather than a proprietary language.

Porting to Open Office. I would be happy for this to go ahead. Ishould be regarded as an extension or port rather than a fork as forks are a last resort – this is not relevant here where the authors are supportive. It would help to reinforce the (Open) Chemical Markup Language (XML) we have developed and to develop the ideas of quality and conformance that are so badly lacking in commercial chemical software. But it needs support and it needs chemists. Open Source chemists are very rare – we struggle to overcome ideas such as “if it’s free it’s inferior”. Whereas in ICT lots of people are supported by their companies (implicitly or explicitly) to contribute to F/OSS, in chemistry no-one is. The F/OSS is largely ignored – though there are signs of this changing. The pharma companies are particularly culpable – we know of several who use F/OSS but give no acknowledge or encouragement.

If there is to be a port to OO it has to be done by chemists and thus will be effectively within the Blue Obelisk community as we know of relatively few other F/OSS chemists. As I’ve said if someone can make this happen we’d be delighted to help. But the barriers are relatively high – it carries no research reward (most F/OSS chemists are in academia or public research) and so is marginal time and to potentially detriment of career. And I cannot imagine it’s technically straightforward. There has been much in the Word work that has been very intricate and could not have happened without expert knowledge. So my main concerns are that it requires formal support and some very unusual individuals.

How to write good documentation quickly

Monday, April 26th, 2010


As you know we launched the Panton Principles 2 months ago.  There has been a lot of interest and I have been talking with a number of open access and open data publishers to convince them of the value of making their data explicitly open.  I described the principles in words but realized that they needed something more substantial to read which is tailored to their particular needs.  So I decided to write an FAQ.

This can be a very tedious process but I enlisted the help of the open knowledge foundation.  We decided that I would ask the questions and that we would communally answer them.  I wrote a series of questions

And then we set up a pirate pad.  This is a communal open website where any one can edit a document.  It’s rather like google docs but is better for simultaneous editing and requires less problems in inviting people.

About six or seven people actively participated and within a day we had the bulk of the FAQ written.  This is because different people were able to answer different questions because they brought different experience and ideas.  If I had had to do the whole thing myself I would probably still be scratching away at some of the questions and answers.

You can see the answers at .

Joe and I have decided to create an FAQ for Chem4Word.  I have suggested about 20 questions along the lines of the FAQ above – and together we will answer them.  In fact given the relative ease of speaking to the machine I will find it quite pleasant to dictate some of the answers on this blog.  This means that not only can they be incorporated in the FAQ but that there will be a public exposure of some of the issues.

OKCon 2010 thoughts

Monday, April 26th, 2010





A brief blog post with a few thoughts about the okcon conference in London last Saturday.

It was a wonderful meeting where I got the feeling that the Open Knowledge Foundation was now a real power in the world of making information and resources available to everyone.  It was well attended and all the sessions were exciting and varied. The plenary session started with a State of the Nation session led by Rufus which has already been recorded and posted on vimeo Thanks to Jo Walsh).  There were presentations ( see previous blog post) which had a wide variety of topics.  In my own presentation I tried to show how science was critical to making major decisions in the current world such as in climate change.  Access to data is critical and it is frequently difficult to know what data exists or to get it even when that is known.  For example the IPCC copyright their publications and require formal permission to reproduce any material.  This is unacceptable in the modern world where we are increasingly requiring machines are to discover information and bring it back to us.  These machines cannot and should not be required to understand legal niceties so the only reasonable way forward for the semantic web is for all public information is categorically open.  I reported on the work done by the OKF and Science Commons on creating appropriate protocols and licences for ensuring that their data was dedicated to the public domain and appropriately licensed.  I believe that the OKF’s support of the Panton Principles is an important milestone in open science.

There were many exciting presentations but two that stuck out work on clean climate code and on open street map.  The presentation on climate code dealt with the FORTRAN which was used to generate the hockey stick graph. Some critics claimed that this FORTRAN could not be compiled and was of such a low quality that it could not have been used.  The presenters showed that they could in fact compile the FORTRAN and reproduce the graph pretty well.  There were several other groups who had written similar programs and generated very similar curves.  This shows that the actual calculation is reproducible.  They are now campaigning that all code used in this endeavour should be clean and public.  Our group takes a similar view in chemoinformatics software.

The work by Open Street Map was also impressive.  When the earthquake struck in Haiti the current maps were extremely poor and it was difficult for the rescue services to know where the roads had been.  However various companies and organisations made satellite images available and in a remarkably short time volunteers  created high quality maps of Haiti allowing the rescue services to know where buildings had been and how to get to them.  In fact the maps of Haiti are now superior to those available before the earth quake.  This is a major credit to OSM which has reached this position in only about five years from its early beginnings in mapping the streets of London by bicycle courier GPS traces.

We finished with a session in the bar where we discussed what needed to be done to help the effort to make climate change data available and computations more accessible.  We agreed to set up a working party and Jonathan Gray at the OKF is the contact for anyone who wishes to explore how we can help this process.

I am proud to be a member of the advisory board of OKF and congratulate all those who organised the meeting especially Rufus, Sara and Jo.

(In the dictation I am like don marquis’ archie the cockroach who cannot reach the shift keys of the typewriter. I shall learn).

Can machines understand science?

Monday, April 26th, 2010

I am taking a new approach to blogging by dictating my thoughts to the machine.

On Friday I am giving a talk called “Can machines understand science?”.  I will argue that the combination of technology and information now allows us to communicate with machines and for machines to communicate with us.  This does not cover the whole domain of human activity but in limited areas such as formal aspects of chemistry machines can perform as well as many human beings.

John Searle devised a thought experiment of the” Chinese room” where a human who did not understand Chinese was in a room.  The person received Chinese characters which they looked up in a large book of instructions which told them how to react to the characters and how to transmit an appropriate output.  If the book of instructions was sufficiently large the machine might output instructions that appeared to be an intelligent response to the input.  If this could be done such that a human observing the process from outside could not tell whether there was a human or a machine inside, Searle asserts that the machine+human “understands” Chinese.

I believe that we are in the same position with some areas of chemistry.  Machines can carry out tasks in a way that cannot be distinguished from humans doing the same task.  For example I believe that a machine can answer some chemistry questions on exam papers as competently as a human.  These will fall into a number of categories such as

·         regurgitating rote to learning

·          carrying out certain algorithms

·         and looking up rules or data in formal procedures.

For example we have now written a system OPSIN which can translate IUPAC nomenclature into chemical diagrams.  Daniel Lowe has now achieved a very high success rate for organic compounds with over 95% conversion and virtually no errors.  This is clearly considerably better than a first year undergraduate who only has a limited chemical vocabulary.

Of course the machine has to read the examination paper and if this is on printed paper this is a technical problem (although current scanning will probably allow most of the essential material to be captured).  There will be ambiguities especially for short lines and dots and the system has to be able to make a reasonable guess [1].  If the paper was available in ASCII or xml then it would be possible to read it without errors.  Having done that the machine has to understand the language in the question but here we are fortunate in that most exam questions are phrased in very formal language and it is fairly easy to apply language processing techniques to understand them.  As a result I assert that a machine could answer an organic nomenclature question on an exam paper.

This is of course only one small sub domain of chemistry and a fairly unusual one in that it consists of a very large number of fairly well explained rules.  There are other parts of chemistry which also have formal rules such as balancing equations or predicting the outcome of well described reactions such as in elementary organic chemistry. In these I believe that machines can do as well as student chemists.

You will argue that this is a very small part of what chemistry is about but we have been able to take it considerably further.  Our machines can now understand parts of research papers such as the experimental recipes.  This is a valuable process in that the machines can now reach the literature much faster than humans.  For example we can read a 300 page patent in a minute or two and understand much of the chemistry. This has been automated to scale to the weekly output of patent offices.

This blog post was dictated.  It is a very different process from I think the style is rather stilted.  However with practice I expect to be able to dictate my thoughts to the machine at least as fast as I can type them.  Stop to think what the machine has actually done in transcribing my audible noises into meaningful English language sentences.  There is a lot that has to be done by the machine but that emphasises how to considerable the advances have been in the last 30 years.

[1] Rant: and of course PDF isn’t much better.

Chem4Word and Blue Obelisk software

Sunday, April 25th, 2010




Egon Willighagen has been a pillar of ODOSOS chemistry and comments yesterday in Chem4Word goes Apache 2.0


Early March I reported about Konstantin‘s JChemPaint-based chemistry plugin for OpenOffice, but there is competition: Chem4Word. Being for Microsoft Word, the plugin only works on top of proprietary software, unfortunately; therefore, I cannot tell you if Chem4Word release is any good, but what Jim has showed me about a year ago, it is pretty cool. Another big difference is that Microsoft gave the Chem4Word a big grant, and Konstantin does not have such funding, AFAIK, and relies on community support.

Now, Chem4Word was released earlier this month, as announced by Joe, and I just heard from Jim about it now being opensourced (and Peter blogged it too). Congratulations to all involved in the development! The Chem4Word project page indicates the actual license: Apache 2.0. Good choice!

Now, I said that a limitation of the plugin is that it requires proprietary software to run. This is why you will not quickly see my use it. Well, this is even why you do not see any screenshot! However, this should not spoil the news. This is for two reasons:

1.    The plug-in is Open Source: this means that the community can learn from their project, and how the make molecular structures in Word documents semantic.

2.    The plug-in saves the chemistry in the Chemical Markup Language in the XML-based Word document: this means that anyone will be able to extract the molecular structures in a semantic meaningful way.

And that’s, to me, the biggest news: if the organic chemists start using this plug-in, this will be a big win for Open Data. I am sure this is the hidden agenda of an unorthodox move of our fellow Blue Obelisk community members.


Many thanks Egon – carefully argued. I wouldn’t describe C4W as competition for OpenOffice – even if the community wishes to port it. In the Blue Obelisk community we have a useful amount of “duplication” or different ways of doing things, but we don’t compete. We are, in fact, glad when others come up with solutions that mean we don’t have to write code ourselves. Typical examples are CDK and JUMBO and Joelib, JChempaint and Chem4Word, etc. These give people an environment in which to try out new ideas, check consistency of data, etc. But we avoid having having 2 different versions of the Periodic table, bonding radii etc. We try to agree on the explicit and implicit semantics and interpretation of chemistry. We use CDK for substructure search and 2D diagram generation, for example; openbabel for substructure search and JUMBO for crystallography and geometry.

In the current case Konstantin is welcome to borrow material from C4W as long as credit is given. In practice .NUMBO is sufficiently different from existing approaches that there probably aren’t major bits that can be borrowed.  But maybe it offers a chance for a fork – I don’t know enough about the details of the code. But the more Open offerings that there are, the more we convince the chemistry community that it’s worth using and worth developing.



Is Chem4Word Open Source? Yes (it can be forked)

Sunday, April 25th, 2010




I have got a pingback ( ) from my announcement of Chem4Word (our chemistry Add-in for Word). It is clearly argued and I share many of the sentiments. This is a LONG Reply.

A while ago, I asked whether we are seeing a trend to promote shallow layers of “open source” on top of a deep proprietary software stack. Once is happenstance. Twice is coincidence. The third time it’s enemy action. [PMR: This refers to Microsoft’s funding of the British Library and the City of Edmonton; the phrase (from Goldfinger) argues that there is a concerted campaign by Microsoft to use Open Source to create lockin to its products].

As a lapsed chemist, it saddens me to criticise a project with such a worthy goal. But this software spreads proprietary lock-in, not freedom. Those wishing to use it can only do so by first buying a stack of proprietary software. Those receiving documents created using it may well not be able to open them unless they have the same software and the same plug-in. Those who distribute the software, or documents created using it, are making science less free.

I am left with some questions:

  1. Will those funding the project also be funding a port to a free software alternative, such as OpenOffice; if not, why not?
  2. Is the phrase “open source” being embraced, extended and de-commoditised?
  3. […]

This shows once again what happens when we focus on the software licensing, instead of on the user’s freedom. Or am I missing something? Are the chemists involved in this doing something that I have missed?

Before I tackle this I’ll recall what I wrote a year ago (quoted by Bill Hooker)

[BH] Peter MR takes the view, with which I concur, that it’s more important to get scientists using semantic markup than to take an ideological stand against Microsoft:

[PMR] Microsoft is “evil”. I can understand this view – especially during the Hallowee’n document era. There are many “evil” companies – they can be found in publishing (?PRISM), pharmaceuticals (where I used to work) Constant Gardener) , petrotechnical, scientific software, etc. Large companies often/always? adopt questionable practices. [I differentiate complete commercial sectors - such as tobacco, defence and betting where I would have moral issues] . The difficulty here is that there is no clear line between an evil company and an acceptable one .[...]

 [PMR]The monopoly exists and nowhere more than in in/organic chemistry where nearly all chemists use Word. We have taken the view that we will work with what scientists actually use, not what we would like them to use. The only current alternative is to avoid working in this field – chemists will not use Open Office.

[BH] There’s a difference between the plugins being Open Source and the plugins being useful to the F/OSS community. If collaborators hold Microsoft to real interoperability, the “Evil Empire” concerns largely go away, because the project can simply fork to support any applications other than Word.

There has also been a brief appeal from Glyn Moody (with whom I shared a platform yesterday at the Open Knowledge Foundation) for the OO community to fork and port C4W. Glyn is a consistent critic of threats against freedom.  This includes NetNeutrality, extension of copyright and software piracy (ACTA). He has also been severely critical of the process used by Microsoft to get OOXML adopted as a standard by ISO (

 ( Chem4Word is out as OpenSource – great, but only works with MS Word: could someone hack this for OpenOffice, please?

First, to answer the questions:

1.      There is currently no Microsoft funding to our group to translate Chem4Word into OpenOffice, but I will transmit the request to them and probably suggest they reply directly (although I can carry the message). Whether we are the best group to do it will depend on the scale of the project and what elements of research there are in it.

2.      I cannot answer this from my personal interactions with Microsoft (I deal primarily with MSResearch). Microsoft has only fairly recently become active in the Open Source world. It has now joined the Apache Foundation. That means the issues will be more public and will be debated more openly. I would expect that Apache would be very concerned if it were to be converted to supporting “embrace, extend, etc.”. I am an optimist and believe that the influence is just as likely to be in the opposite direction where OSS successes get fed back into the culture of Microsoft and change it.

The architecture of Chem4Word has always been designed to keep the chemistry component Open (in terms of Open Specifications, Open data and Open Source) and we have delievered that. It’s involved a great deal of non-Word work and product including:

1.      A port of the Open Source Java library JUMBO5 to C# (dotNUMBO). This has no dependence on Word and has resulted in some novel ideas in functional/stateless programming. The intention was and still is to try to keep JUMBO and .NUMBO in sync. This might be done by automatic translations, library wrappers or by blood and sweat. The .NUMBO enhancements will be fed directly into the next version of JUMBO6.

2.      Development of a sub-schema of Chemical Markup Language (CML) which is now used to validate input files. This is probably the strictest validation of chemical input anywhere (syntactic and semantic variation is a serious problem in chemistry leading to a great deal of loss and corruption). Again the Schema is Open.

3.      Development of a new approach to building molecules.

4.      Web services which link to Open resources such as Pubchem and our OPSIN system.

I’ll pause to reflect on portability. The C# is written against units tests and to  - I think – an acceptable porting standard. Although C# is primarily used in .NET environments it can be compiled and run under Mono (though we haven’t done so). The graphics is written in WPF and XAML. I doubt this is really portable to Mono. But graphics portability has been the curse of my life for 35 years and the horror never changes. It won’t be trivial to convert WPF/XAML to Java but then it wouldn’t be trivial to do that same with, say, OpenGL. I think we have to concede that user interfaces have to be written in a bespoke manner. (Even the hope of standardisation through browsers is a mess).

Next I’ll tackle OOXML ( ). (Apparently the I4I patent case didn’t invalidate what we have written – Word2010 is free of problems and C4W runs under it). I accept Glyn’s concerns about Microsoft’s approach to the standards process. It looks ugly. I don’t think it’s as bad as the Halloween document. OOXML is therefore an Open standard and this is what we use. I’ve now looked in some depth at the bits of OOXML relevant to C4W and written code to process the chemistry them. That code is open. We expect to do the reverse – create C4W documents from APIs. In passing I note that whatever is used for managing the document is necessarily complex. Documents are complex, typography adds a serious extra axis. OOXML seem to be a reasonable approach (at least I can’t think of a much simpler one). The chemistry and the figures/media are also part of the Open Packaging Conventions.  We use CustomXML for managing the chemistry (as CML).

Up to this point it leaves us with open specifications (CML), data (in CML) and software which, though Microsoft-oriented, requires no proprietary tools if you are prepared to work hard enough. It may (though we haven’t tried it) be possible to create a graphic display in Mono.

That leaves interaction with Word itself. Personally I don’t like Word and until recently preferred to create my documents in ASCII or HTML. I don’t like WYSIWG editors (and regret the passing of Wordperfect, etc.). But it is what people use. If we don’t like it there are only the possibilities:

·         Persuade the chemical community to use OpenOffice. I’ve used OO. I like it even less than Word. It feels to me like a slightly unfinished product. Peter Sefton integrated OO  into my blogging post and I wrote posts that way but didn’t enjoy it (admittedly native WordPress was worse).

·         Use LaTeX. But LaTeX is not semantic and never will be and it’s entirely unsuitable for chemistry. We need structured semantic documents.

·         Wait till there is a sufficiently credible Open alternative that chemists will adopt. This will take decades at present progress

·         Give up trying to promote semantic documents in chemistry.

Glyn highlights that TimBL will not touch Word (  Yet Berners-Lee refuses, on principle, to use Word, which is a proprietary rather than an open source format. On one occasion, one official recalled, Berners-Lee received an urgent document in Word from one of the most senior civil servants—and refused to look at it until a junior official had rushed to translate it into an acceptable format.” I used to take the same view – but have changed to tolerate it. If I’m a class-traitor, sorry. My main concern is with unacceptable practices in chemistry:

·         Manufacturers who ensure lockin to equipment through binary formats

·         Bad binary chemical software using proprietary standards

·         Secrecy in data and algorithms

·         Companies which sue chemists for openly reporting bugs or publishing benchmarks

·         Publishers who actively lobby against freedom of information

I thought about it and believe that the value of opening up chemistry is more important than taking an absolute stand against the use of Word. (I might have done this ten years ago). We have always developed the code such that it is possible to fork. This is the primary contract of Open Source. Glyn has appealed to the OS/OpenOffice community to create an OO version. I don’t think this can be done by a single person and certainly not someone who doesn’t understand OO. F/OSS gives the right to fork. It does not promise that it can be done with zero cost. It needs funding in cash or in kind (significant committed volunteer effort).

So if the F/OSS community wishes to see C4W in OO I am happy to give what little technical suggestions I can. (It will need a different name). I can’t speak for my colleagues but I would not be surprised if they agreed.