Why is it so difficult to develop systems?

Dorothea Salo (who runs Caveat Lector blog) is concerned (Permalink) that developers and users (an ugly word) don’t understand each other:

(I posted a lengthy polemic to the DSpace-Tech mailing list in response to a gentle question about projected DSpace support for electronic theses and dissertations. I think the content is relevant to more than just the DSpace community, so I reproduce it here, with an added link or two.)


My sense is that DSpace development has only vaguely and loosely been guided by real-world use cases not arising from its inner circle of contributing institutions. E.g., repeated emails to the tech and dev lists concerning metadata-only deposits (the use case there generally being institutional-bibliography development), ETD management, true dark archiving, etc. etc. have not been answered by development initiatives, or often by anything but “why would you even want that?” incomprehension or “just hack it in like everybody else!” condescension.

PMR: This has been a perennial problem for many years and will continue to be so. I’m also not commenting on DSpace (although it is clearly acquiring a large code base).  But my impression of the last 10-15 years (especially W3C and Grid/eScience projects) is that they rapid become overcomplicated, overextended and fail to get people using them.
One the othe hand there are the piles of spaghetti bash, C, Pythin and so on which adorn academic projects and cause just as much heartache. Typical “head-down” or throwaway code.
The basic fact is that most systems are complicated. And there isn’t a lot that can be done easily. It’s summed up by the well-known  Conservation Of Complexity

This is a hypothesis that software complexity cannot be created or destroyed, it can only be shifted (or translated) from one place (or form) to another.

If, of course, you are familiar with the place that the complexity has shifted to it’s much easier. So if someone has spent time learning how to run spreadsheets, or workflows, or Python, and if the system has been adapted to those it may be easier. But if those systems are new then they will have serious rough edges. We found this with the Taverna workflow which works for bioscience but isn’t well suited (yet) for chemistry. We spent months on it, and but those involved have reverted to using Java code for much of our glueware. We understand it, our libraries work, and since it allows very good test-driven development and project management it’s ultimately cost-effective.


We went through something like the process Dorothea mentions when we started to create a submission tool for crystallography in the SPECTRa : JISC project.  We though t we could transfer the (proven) business process that Southampton had developed for the National Crystallographic Centre. And that the crystallographers would appreciate it. It would automate the management of the process from  receibving the crystal to repositing the results in DSpace.


It doesn’t work like that in the real world.


The crystallographers were happy to have a reposition tool, but they didn’t want to change their process and wouldn’t thank us for providing a bright shiny new one that was “better”. They wanted to stick with their paper copies, the way they disseminated theoir data. So we realised, and backtracked. It cost us three months, but that’s what we have to factor into these projects. It’s a lot better than wasting a year producing something people don’t want.


Ultimately much of the database and repository technology is too complicated for what we need at the start of the process. I am involved in one project where the database requires an expert to spend six months tooling it up. I thought DSpace was the right way to go to reposit my data but it wasn’t. I (or rather Jim) put150,000+ molecules into it but they aren’t indexed by Google and we can’t get them out en masse. Next time we’ll simply use web pages.


By contrast we find that individual scientists, if given the choice, revert to two or three simple, well-proven systems:

  • the hierarchical filesystem
  • the spreadsheet

A major reason these hide complexity is that they have no learning curve, and have literally millions of users or years’ experience. We take the filesystem for granted, but it’s actually a brilliant invention. The credit goes to Denis Ritchie in ca. 1969. (I well remember my backing store being composed of punched tape and cards).
If you want differential access to resources, and record locking and audit trails and rollback and integrity of commital and you are building it from scratch, it will be a lot of work. And you lose sight of your users.
So we’re looking seriously at systems based on simpler technology than databases – such as RDF triple stores copuled to the filesystem and XML.
And the main rule is that both the users and the developers have to eat the same dogfood.  It’s slow and not always tasty. And you can’t avoid Fred Brooks:

 Chemical engineers learned long ago that a process that works in the laboratory cannot be implemented in a factory in one step. An intermediate step called the pilot plant is necessary….In most [software] projects, the first system is barely usable. It may be too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter, and build a redesigned version in which these problems are solved…. Delivering the throwaway to customers buys time, but it does so only at the cost of agony for the user, distraction for the builders while they do the redesign, and a bad reputation for the product that the best redesign will find hard to live down. Hence, plan to throw one away; you will, anyhow.

Very simply, TTT: Things Take Time.

This entry was posted in programming for scientists, repositories. Bookmark the permalink.

One Response to Why is it so difficult to develop systems?

  1. Rich says:

    Very interesting post! I think this sort of thing is a huge issue for any area of academia which is ‘computer heavy’. I find a couple of things can make life particularly difficult (I work mainly in bioinformatics):
    1 – we’re writing code in order to do some science, so there’s a temptation to get the software done as quickly as possible so we can get on to the ‘cool stuff’
    2 – a lot of the software we work on implements new (and often experimental) analysis methods; this means it’s very easy to end up working with prototypes a *lot* of the time! It take a bit of discipline to stop the science for long enough to build a proper version 🙂
    I think the end of your post nails a couple of key points. The developers must use their software. And things definitely do take time!

Leave a Reply

Your email address will not be published. Required fields are marked *