How we add quality to our software

Dictated because of a slightly dodgy keyboard into Arcturus

I have praised the skill and dedication that Jim downing brought to our group and that is now permanently embedded in its culture. This has been to take a group of committed scientists with self taught computer skills show them the cutting edge of group software development.

In many areas of science it is common for individuals to write programs with out interacting with their neighbours. There are indeed some very good single person programmes but there are also very many more are very bad single person programmes. I believe that group working should be mandatory for scientific programmers for several reasons:

  • Programming is hard and complex even the gurus know that they frequently get it wrong.
  • There are many operations which are common to most programs and with out group working and sharing there will be unnecessary duplication which in any case usually leads to errors and confusion.
  • There are a number of established patterns which are highly valuable in organizing software. These are not normally taught and not normally acquired when scientists start writing programs.
  • There is a discipline to writing code which is initially somewhat painful. In the same way as musicians have to learn scales, programmers have to learn the value of unit tests. Again this does not come naturally.

There are a number of tools that Jim has introduced us to, probably in order:

  • A source control system. (SCS) over the years we have moved from CVS to SVN and now to Mercurial.
  • An integrated development environment (IDE). We use eclipse but there are several others.
  • Unit testing and test driven development. This is essential for almost all our code. As you will see the environment enforces and encourages it.
  • A build environment. Again this is essential. The build pulls together all the resources which are required, checks on their availability and produces and tests the code.
  • A continuous integration environment (Hudson) .

The change to our productivity has been enormous and the change to our culture likewise. It is now expected that anybody can have access to anybody else’s code and work. Of course, since our projects are open source, there are simpler ways of doing this such as source forge and bit bucket. But we also expose our code publicly to the group on Friday mornings. No one, not even Jim, is beyond criticism but all criticism is constructive.

To show what we have achieved the screen shot shows some of our current projects (https://hudson.ch.cam.ac.uk/ ). There are about 30 in the system and they can range from a few hundred lines to tens of thousands. Almost all projects have dependencies which means that if you break something early in the build many other people may suffer. Hudson runs every 20 minutes and rebuilt everything that has changed or which depends on something that has changed. You can visit the site yourself which is probably the best way to understand it. In the current case I have one project which depends linearly on five other projects and each of these has to build. As you can see from the numbers some systems have been built over 100 times. Unless you have tried this yourself you will not appreciate the enormous amount of confidence that it gives you that your software is fit to share with the rest of the group. It does not mean that the software necessarily gives the right answers but it should mean that it does not fall over in a heap before starting.

Most academic software does not get redistributed beyond its point of creation. Sometimes for acceptable reasons where it is developed to tackle a particular short term problem. However short term software has a habit of turning into a long-term lava flow (a well known anti Pattern). But much software is simply not fit for distribution. It takes time and effort which cannot be credited against the traditional narrow minded approach of evaluation through H indices.

We are proud of our software, much of it supported by grants from JISC and we are in the process of making it is available as possible to the community. By combining the components in the system we can address a number of problems in semantic chemistry, particularly textmining, and computational chemistry. With Crystaleye we have the most advanced semantic system for chemical crystallography and this has only been possible because of the framework on which we built.

This entry was posted in Uncategorized. Bookmark the permalink.

One Response to How we add quality to our software

  1. Craig Bruce says:

    I totally agree with the sentiment of this post, so much is replicated or just lost over time in academic groups. Actually, industry isn’t immune to this either.
    Hudson looks great. I’ve been using CruiseControl for CI but it is rarely updated. I’ll certainly be looking into Hudson.

Leave a Reply

Your email address will not be published. Required fields are marked *