Is Chem4Word Open Source? Yes (it can be forked)




I have got a pingback ( ) from my announcement of Chem4Word (our chemistry Add-in for Word). It is clearly argued and I share many of the sentiments. This is a LONG Reply.

A while ago, I asked whether we are seeing a trend to promote shallow layers of “open source” on top of a deep proprietary software stack. Once is happenstance. Twice is coincidence. The third time it’s enemy action. [PMR: This refers to Microsoft’s funding of the British Library and the City of Edmonton; the phrase (from Goldfinger) argues that there is a concerted campaign by Microsoft to use Open Source to create lockin to its products].

As a lapsed chemist, it saddens me to criticise a project with such a worthy goal. But this software spreads proprietary lock-in, not freedom. Those wishing to use it can only do so by first buying a stack of proprietary software. Those receiving documents created using it may well not be able to open them unless they have the same software and the same plug-in. Those who distribute the software, or documents created using it, are making science less free.

I am left with some questions:

  1. Will those funding the project also be funding a port to a free software alternative, such as OpenOffice; if not, why not?
  2. Is the phrase “open source” being embraced, extended and de-commoditised?
  3. […]

This shows once again what happens when we focus on the software licensing, instead of on the user’s freedom. Or am I missing something? Are the chemists involved in this doing something that I have missed?

Before I tackle this I’ll recall what I wrote a year ago (quoted by Bill Hooker)

[BH] Peter MR takes the view, with which I concur, that it’s more important to get scientists using semantic markup than to take an ideological stand against Microsoft:

[PMR] Microsoft is “evil”. I can understand this view – especially during the Hallowee’n document era. There are many “evil” companies – they can be found in publishing (?PRISM), pharmaceuticals (where I used to work) Constant Gardener) , petrotechnical, scientific software, etc. Large companies often/always? adopt questionable practices. [I differentiate complete commercial sectors – such as tobacco, defence and betting where I would have moral issues] . The difficulty here is that there is no clear line between an evil company and an acceptable one .[…]

 [PMR]The monopoly exists and nowhere more than in in/organic chemistry where nearly all chemists use Word. We have taken the view that we will work with what scientists actually use, not what we would like them to use. The only current alternative is to avoid working in this field – chemists will not use Open Office.

[BH] There’s a difference between the plugins being Open Source and the plugins being useful to the F/OSS community. If collaborators hold Microsoft to real interoperability, the “Evil Empire” concerns largely go away, because the project can simply fork to support any applications other than Word.

There has also been a brief appeal from Glyn Moody (with whom I shared a platform yesterday at the Open Knowledge Foundation) for the OO community to fork and port C4W. Glyn is a consistent critic of threats against freedom.  This includes NetNeutrality, extension of copyright and software piracy (ACTA). He has also been severely critical of the process used by Microsoft to get OOXML adopted as a standard by ISO (

 ( Chem4Word is out as OpenSource – great, but only works with MS Word: could someone hack this for OpenOffice, please?

First, to answer the questions:

1.      There is currently no Microsoft funding to our group to translate Chem4Word into OpenOffice, but I will transmit the request to them and probably suggest they reply directly (although I can carry the message). Whether we are the best group to do it will depend on the scale of the project and what elements of research there are in it.

2.      I cannot answer this from my personal interactions with Microsoft (I deal primarily with MSResearch). Microsoft has only fairly recently become active in the Open Source world. It has now joined the Apache Foundation. That means the issues will be more public and will be debated more openly. I would expect that Apache would be very concerned if it were to be converted to supporting “embrace, extend, etc.”. I am an optimist and believe that the influence is just as likely to be in the opposite direction where OSS successes get fed back into the culture of Microsoft and change it.

The architecture of Chem4Word has always been designed to keep the chemistry component Open (in terms of Open Specifications, Open data and Open Source) and we have delievered that. It’s involved a great deal of non-Word work and product including:

1.      A port of the Open Source Java library JUMBO5 to C# (dotNUMBO). This has no dependence on Word and has resulted in some novel ideas in functional/stateless programming. The intention was and still is to try to keep JUMBO and .NUMBO in sync. This might be done by automatic translations, library wrappers or by blood and sweat. The .NUMBO enhancements will be fed directly into the next version of JUMBO6.

2.      Development of a sub-schema of Chemical Markup Language (CML) which is now used to validate input files. This is probably the strictest validation of chemical input anywhere (syntactic and semantic variation is a serious problem in chemistry leading to a great deal of loss and corruption). Again the Schema is Open.

3.      Development of a new approach to building molecules.

4.      Web services which link to Open resources such as Pubchem and our OPSIN system.

I’ll pause to reflect on portability. The C# is written against units tests and to  – I think – an acceptable porting standard. Although C# is primarily used in .NET environments it can be compiled and run under Mono (though we haven’t done so). The graphics is written in WPF and XAML. I doubt this is really portable to Mono. But graphics portability has been the curse of my life for 35 years and the horror never changes. It won’t be trivial to convert WPF/XAML to Java but then it wouldn’t be trivial to do that same with, say, OpenGL. I think we have to concede that user interfaces have to be written in a bespoke manner. (Even the hope of standardisation through browsers is a mess).

Next I’ll tackle OOXML ( ). (Apparently the I4I patent case didn’t invalidate what we have written – Word2010 is free of problems and C4W runs under it). I accept Glyn’s concerns about Microsoft’s approach to the standards process. It looks ugly. I don’t think it’s as bad as the Halloween document. OOXML is therefore an Open standard and this is what we use. I’ve now looked in some depth at the bits of OOXML relevant to C4W and written code to process the chemistry them. That code is open. We expect to do the reverse – create C4W documents from APIs. In passing I note that whatever is used for managing the document is necessarily complex. Documents are complex, typography adds a serious extra axis. OOXML seem to be a reasonable approach (at least I can’t think of a much simpler one). The chemistry and the figures/media are also part of the Open Packaging Conventions.  We use CustomXML for managing the chemistry (as CML).

Up to this point it leaves us with open specifications (CML), data (in CML) and software which, though Microsoft-oriented, requires no proprietary tools if you are prepared to work hard enough. It may (though we haven’t tried it) be possible to create a graphic display in Mono.

That leaves interaction with Word itself. Personally I don’t like Word and until recently preferred to create my documents in ASCII or HTML. I don’t like WYSIWG editors (and regret the passing of Wordperfect, etc.). But it is what people use. If we don’t like it there are only the possibilities:

·         Persuade the chemical community to use OpenOffice. I’ve used OO. I like it even less than Word. It feels to me like a slightly unfinished product. Peter Sefton integrated OO  into my blogging post and I wrote posts that way but didn’t enjoy it (admittedly native WordPress was worse).

·         Use LaTeX. But LaTeX is not semantic and never will be and it’s entirely unsuitable for chemistry. We need structured semantic documents.

·         Wait till there is a sufficiently credible Open alternative that chemists will adopt. This will take decades at present progress

·         Give up trying to promote semantic documents in chemistry.

Glyn highlights that TimBL will not touch Word (  Yet Berners-Lee refuses, on principle, to use Word, which is a proprietary rather than an open source format. On one occasion, one official recalled, Berners-Lee received an urgent document in Word from one of the most senior civil servants—and refused to look at it until a junior official had rushed to translate it into an acceptable format.” I used to take the same view – but have changed to tolerate it. If I’m a class-traitor, sorry. My main concern is with unacceptable practices in chemistry:

·         Manufacturers who ensure lockin to equipment through binary formats

·         Bad binary chemical software using proprietary standards

·         Secrecy in data and algorithms

·         Companies which sue chemists for openly reporting bugs or publishing benchmarks

·         Publishers who actively lobby against freedom of information

I thought about it and believe that the value of opening up chemistry is more important than taking an absolute stand against the use of Word. (I might have done this ten years ago). We have always developed the code such that it is possible to fork. This is the primary contract of Open Source. Glyn has appealed to the OS/OpenOffice community to create an OO version. I don’t think this can be done by a single person and certainly not someone who doesn’t understand OO. F/OSS gives the right to fork. It does not promise that it can be done with zero cost. It needs funding in cash or in kind (significant committed volunteer effort).

So if the F/OSS community wishes to see C4W in OO I am happy to give what little technical suggestions I can. (It will need a different name). I can’t speak for my colleagues but I would not be surprised if they agreed.



This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Is Chem4Word Open Source? Yes (it can be forked)

  1. Pingback: Twitter Trackbacks for Unilever Centre for Molecular Informatics, Cambridge - Is Chem4Word Open Source? Yes (it can be forked) « petermr’s blog [] on

  2. Pingback: Chem4Word Should Support Free Software Platforms/Office Suites to Better Qualify as Free Software | Techrights

Leave a Reply

Your email address will not be published. Required fields are marked *