Chem4Word: Semantics is a hard challenge

This is a brief update… Although I have lots to communicate we have been spending most of our time working on Chem4Word and I don’t have time for blogging. We’ve (== Joe and bits of me, with Tim) frozen the API, and now we are fixing bugs. Although programming is over 50 years old (and I wrote my first program 45 years ago), bugs are universal. There are better bugkillers now, but there are bigger and more bugs. The greats of compSci have all recognised this, see:

which should be required reading for anyone before they touch a keyboard. I’ll take just one:

Even the best planning is not so omniscient as to get it right the first time.
— Fred Brooks

Every experienced software developer knows this, and almost every software developer represses it. We are driven by optimism in principle we should improve as we do projects and therefore we should do better and faster work. But, of course, we also increase our expectations. And the world expects more of us.

We didn’t get it right first time. We couldn’t. Because we are embarking on something new this is not YACE yet another chemical editor. This is a semantic chemical environment. And semantics are hard. Not impossible, but hard. And there is no way round.

Here’s a brief example. Many chemical editors have a button with a + sign (and another with a ). It’s meaning is add a positive charge to the atom. Sounds simple enough CMLAtom has a integer formalCharge attribute all we have to do is increement or decrement it. But what does it mean? This is where semantics (and CML attempts to be a semantic language) bites us. A semantically valid molecule in CML must know exactly what atoms and how many electrons it contains. What does + do to the electron count? Presumably it decreases it by one? Well sometimes it does, and sometimes it doesn’t. Because many editors are oriented towards organic chemistry where + can mean add a proton to an atom (a proton is H+) rather than remove electron (whihch might be signified by . (add radical). The + convention is so implicit that it’s universally understood, but never stated.

We’ve identified several different meanings of the + semiotics which depend on element identity and chemical environment. It’s so polymorphic and woolly that it proved impossible to write semantically consistent code. So we’ve had to redesign. We now have a button called add H+. This is not a common approach I don’t know whether other tools use it. But for us it’s a logical and semantically clean approach. Is this a bug? It certainly fits Fred Brooks’ maxim. And have we got it right the second time? Until we get human chemistry feedback we won’t know.

So back to the unit tests. We can’t do it without them. Boring boring boring. But at least I can watch the TV as well interesting program on Spanish ‘flu. And do about 6 tests an hour…

More blogging at some time.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *