I have been “silent” for over two weeks – not because there was nothing to say but because we have been working very hard to get the first version of Chem4Word frozen. For Joe and me that means that when we get up in the early morning we think of nothing else and when we try to go to sleep it is whizzing round in our heads. This type of 100+ hour coding week can turn people into subhumans …
… But we’ve frozen the API and are technically in bug-fix mode. There are, of course, bugs to fix and we are tackling them. But we have our sights on releasing RSN (real soon now).
I should make it clear that Chem4Word will be Open Source. Everyone in the project is geared towards that. Microsoft is now starting to release considerable amounts of Open Source, and we are pushing hard to get the final legal clearance. I’m happy to discuss on this blog what Microsoft + Open Source means in a later blog post. I know there are readers who believe that Microsoft’s motto is “do only evil” – and I used to be close to that view. But Microsoft has changed, and so have I.
Our current strategy – and this may change – is to release as Open Source and to create a governance model that will allow managed Open development. There are lots of projects in software engineering such as Eclipse, Apache, etc. which have successful models. There are no such models in chemistry so we are in new territory. I’d welcome suggestions and offers.
I’ll be writing more about C4W but at present just a statement of some of the major bits
-
C4W consists of several modules, some of which are formally independent of Word.
-
The chemistry engine (based on CML and JUMBO, hence .NUMBO – “dotNUMBO”) is written in C#
-
The graphics and UI is based on WPF/XAML in C#
-
There is a stateless interface (CID) between the UI and .NUMBO which defines an abstraction of chemical commands
-
There is an import pipeline which enforces syntactically and semantically valid chemistry, thus avoiding the problem of not knowing what the chemical input actually represents.
-
There is considerable functionality (e.g. gallery, navigator) to interact with the Word document.
Chem4Word is a semantic editor – I suspect it’s the first for chemistry. Writing semantically correct code and documents is a hard discipline. Most current chemical tools require a sighted human to make judgements as to what something means, but this does not work in the era of the Semantic Web where machines must make accurate deductions. For example many tools allow the user to “add a + charge to an atom”, but what does this actually mean? Does it change the implicit hydrogen count? Or the spinMultiplicity? The answer is that it depends on the chemistry and there is no universal algorithm to do this. So C4W is built with a framework that allows semantics to be imposed by the chemistry.
In summary, we have got a toolset with significant novel functionality – even in places some limited “chemical intelligence”. When it’s released I will write blog posts explaining some of this.
Many thanks to the team – Joe, Tola, Tim, Alex, Lee, Jim+Jim.
Microsoft indeed starting to release GPL-ed software, which I, like some other, think is positive. Regarding the choice for for piece to start with, I think they, for once, forgot to consult the marketing team, and chose a project where they were actually in violation of GPL, making the release not their choice, but actually obligatory, worsening peoples view on Microsoft’s own interest in Open Source. Another issue here, is that the first thing the made open source only benefits their own proprietary software (at least at this moment). But it is a start, and anything released GPL will never go away.
I think the cheminformatics community is seeing the value of semantics in chemical editing, and understood that even closed-source product have shown serious evolution in this area. JChemPaint also followed the semantic path for a while, but does not have the advantage of tight integration in a production phase editing tool like Chem4Word has. With the current marketshare of Word, this editor will quickly see a quick uptake and bring semantic chemical editing to a new audience, that of organic chemists. This is positive, and anything drawn in this tool will be semantic and interoperate with other tools. That is positive too, even if many of us will not use the editor at all, like me.