BioIT – Chem4Word

I’m in Boston for Bio-IT World Conference & Expo 2009 for two main reasons, an invited talk “the Chemical Semantic Web” (Computational Chemistry track) and also our first public demonstration of the Chem4Word software (research.microsoft.com/en-us/projects/chem4word/ ) . For those who are at the meeting, the first’s on Wednesday morning, the second on Tuesday lunchtime.

The C4W demo has been worked on very hard for the last month. There was a dress rehearsal in Redmond at the Microsoft External Research meeting which was ready about 5 minutes before the presentation. We took the decision to freeze that functionality and to show it in Boston after the bugs had been ironed out. The discipline of having a fixed deadline (an international meeting) is an excellent way of concentrating minds within a project. Rudy Potenzone is demo-ing the software but I’ve got the demo on my machine as well.

What does Chem4Word do? It’s more important to say what it is.

At one level it’s an add-in that chemists can use to author documents. An the other end it’s a toolkit which can be used to develop the next generation of bench-top chemical software. I owe Rudy some introductory material, so I might as well use this blog to do it.

Chem4Word is an Open Platform for collaborative chemical software development in a .NET environment.

C4W will be transferred to CodePlex (the MS Open software site) and will be available for anyone to help develop, much as in the spirit of the Blue Obelisk. Learning from other Open Source chemistry projects we have though closely about sustainability of management.

Chem4Word is an Add-In to Word2007 that creates a semantic authoring tool for chemistry.

Word2007 is a platform that supports semantic authoring. Its use of smartTags allows words and phrases to be linked to a range of document components, including a Gallery, a Navigator.

Chem4Word uses (chemical) Ontologies.

With the new Microsoft Research Ontology Add-In external ontologies (we use Nico Adams’ ChemAxiom) document components can be managed by a formal ontology. At one level this is a chemical spell-checker, at another a thesaurus, at another a converter between scientific units and at yet another a transformation tool between scientific concepts.

Chem4Word emphasizes semantics by using CML as its exposed data model

Current chemical toolkits require a fixed data model for objects. C4W communicates with CML (and other XML) as its data model. This gives a declarative programming model where there are no side effects. Effectively this is a new programming language for chemistry, both formal and flexible

Chem4Word is modular

The graphics and UI are decoupled from the chemical engine. This means that commands can be issued to the engine from sources other than the UI. The document is also modular – it’s possible to examine the chemistry, the links, the tags all as XML and to build document processors independent of Word.

Chem4Word supports validation

All CML has to conform to a schema (CML-Lite) and can be validated at every stage. The import pipeline takes 4-5 stages with validation and normalization. It is impossible to import or author an invalid file. This is intended as an important contribution to bringing needed quality into chemistry.

Chem4Word integrates Text and chemistry and styles

The Word document introduces ChemistryZones : which are chunks of the document representing chemistry. These are all backed by a CML object which itself can have many components, currently:

  • single molecule

  • compound molecule (salts, hydrates, complexes)

  • formula

  • name

Each of these can be displayed in a chemistry zone, making it possible to change the representation of an object, while preserving the semantics. The Navigator allows the user to select a given zone or to navigate from it.

Current functionality

The current project had to balance functionality, semantics and aesthetics and has put most emphasis on semantics. The primary functionality is currently:

  • manage gallery, navigator and other Word concepts

  • create chemistry zones

  • import CML molecules

  • validate them

  • render them, with different styles in different zones

  • tweak them (move atoms to prettify the molecule)

  • change atoms

We have deliberately not (yet) introduced chemical editing tools as we wish to get the UI framework correct and validate the semantics. With the large number of molecules now available (e.g. in Pubchem) we can convert these to valid CML outside C4W and import them. This means that unless chemists are working with new molecules C4W will already support many of their authoring needs.

The future

The current project runs for another few months at the end of which we’ll have a release version. (We shall make the current version available to a few pre-alpha collaborators). A major emphasis is to create a distribution which is well designed for development and even if that means limiting the initial functionality. We’ll work hard on developing use cases where C4W is useful, especially in the creation of compound documents.

We’ll tell you then where this is going after that.

This blog authored with ICE + Open Office; thanks to PeterSefton and USQ

(Note: Just when I thought I had the ICE plugin working, it now fails to post. I think this may be due to firewalls or something else, but I can’t grab the error message as it disappears. So I have to cut and paste. I think that’s why the fonts go wonky)

This entry was posted in "virtual communities", Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *