I took several months off blogging – completely. Why? Because I was concentrating on Chem4Word – a semantic chemical authoring tool sponsored by and jointly built with Microsoft External Research in Redmond. In this and subsequent posts I’ll now tell you about it and what hopes we have for it. Yesterday – and I’ll explain why in later posts – was a breakthrough day for us and I felt I could start to tell the world about it.
When we started I had no idea of the amount of mental and emotional effort I was going to have to put into this. We started about 2 years ago when Tony Hey and Ley Dirks invited some academic chemical informatics folk to Redmond to plan what has started as eChemistry and has now turned out to be OREChem. I’ll be blogging about this regularly elsewhere. But Tony and Lee also asked me if I would like our group and his to jointly develop a chemical drawing tool (I think that’s the phrase used then) for Word.
Why our group? Why not a commercial software company with lots of experience, with customers, with proven algorithms, etc. I can’t put words into his mouth but remember that Tony ran the UK eScience program and got a view of many disciplines – medicine, earth, environment, transport, oil, and particularly bioscience. These disciplines were applying the new tools of the distributed web, looking to share semantic information and prepare for the multi-party clouds that we are now seeing. He visited the ACS-CINF session to see what the state of the art was, and I’ll have to leave him to say what he concluded.
Microsoft – through the strenuous efforts of Jean Paoli – was a very early adopter of XML and so it was natural that Tony should look to CML (Chemical Markup Language) as the basic of our project. We’ve started from day zero with an XML data model, which has an amazing number of benefits. It supports side-effect-free programming (ably steered by Jim Downing) and can take advantage of the incredibly powerful LINQ system in dotNet (it’s really “.NET”) but it’s very easy to miss the dot.
So we started with a very fluid high-level plan and at the same time started the contractual negotiations. For those of you who don’t know, *all* large companies take a long time to finalise contracts on the first occasion. (And so do many small ones…). And Universities aren’t always quick. So the contract is measured in hundreds of days negotation – many of these anticipating the problems we might run into. (In retrospect – no surprise – a lot of this seems very periperal or even obsolete.
The basis of the chemistry was a CML engine. I’d already written JUMBO (Java Universal Molecular browser for Objects), which is neither Universal and no longer a browser – but this is converted into dotNUMBO (in C#).I’d hoped that we could autoconvert the software and I’m very glad we didn’t try. Although all lines of code have been typed in blood the design is much cleaner and it’s much smaller. And we are startiung with a subset of chemistry.
The effort has been incredible. I think I and Jim are nominally 10% of our time, but that’s >10% of 162 hours/week at times. Over the last 2-3 months we’ve taken to having daily telcons – sometimes short and sometimes cancelled – but often over an hour. And this effort has been made by the MS people as well – we’ve had many visits from Lee, Alex, Savas, and also a week’s stay from JimM. I think it’s taken over some of their lives as well….
It’s been a bigger project than any of us thought (I think). It’s been run on project management system (TFS) where we all invent scenarios and all get (multiple) tickets for tasks. Many of these are past their sell-by date. At times they hung over me (at least) as a cloud of doom, showing how slowly the project was going. Parts of milestones were missed. Scenarios moved to later milestones, etc.
But we set ourselves two deadlines which could not be moved. This is often a good way of helping to ensure communal vision and prioritisation. The deadlines were
- yesterday (the Microsoft External Research symposium, where we would present to a wider range of academics and MS staff
- end of April – BioIT is Boston – a large event with many commercial vendors and purchasers.
We made it yesterday with 10 minutes to spare. It was great. More – much more – later…
Cheers! Looking forward to the much, much more! Please do include tons of screenshots as I will not be able to test the code myself.