Knowledge-limited, not time-limited

In recent comments, JamesM raised the idea of “spike solutions“.
I had never heard of these (nor had Wikipedia) so I asked and found they came from the XP school – here is a reasonable description. Particularly:

Spikes are good when you are knowledge-limited, not time-limited. — KentBeck

or to put it simply – when you haven’t a clear idea where you are going.
This happens (or should happen) a lot in research. Essentially we start out to see if a particular approach goes somewhere useful. You can’t write tests in advance because you don’t know what you are testing. Here are examples from my own work.

  • I would like to be able to interpret chemical diagrams as molecular connection tables (i.e. as atoms and bonds rather than pixels). I know it had been done before by commercial companies but the code was unaffordable (and I suspect) of limited applicability. And I had what I thought was a smart idea – to look for common patterns as the images within a paper would probably come from the same software. I had no idea whether I would use neural nets, Fourier transforms or brute force. So I downloaded a number of images which looked of good quality, and lashed up some code to try to isolate glyphs for machine learning. In fact the glyphs turned out to be less isolated that I had thought – the imagaes weren’t monochrome but had a lot of antialiasing to soften them. This meant that some image processing was necessary – so I had to create some simple filters (and I sometimes don’t reuse code because I like implementing algorithms to see how they work). I made it to the start of the next stage but there were no quick wins and not many wins at all. At that stage I stopped the activity and moved to using text parsing instead.
  • Frequently when you deal with external data sources – especially unstructured ones such as text – you get a Zipf‘s law distribution of problems. You get a good feel for how the bulk of the data behaves, and code for that. You might at this stage decide you wish to validate your code against Tests. However with every new data item there is a chance that it uses something slightly different which might break the test. So you are actually testing the data against the tests, not the code. At some stage you need to draw a line and declare an arbitrary conformance spec. There are then at least two tests. One for the code against a small “platonic” data set which makes sure that you don’t break core methods during refactoring and one for the data which tests conformance against this platonic spec.
  • I wish to use an external tool (database, library, etc.). I don’t know how this works or even whether it works and if so, will it do what I want. So there is quite a lot of glue code that makes connections, creates sample objects etc. We can create test objects but we can’t test the tool until we know how it behaves!

Sometimes these things actually work! So at that stage the discipline has to be to modularise and refactor the experimental code. Obviously it is post facto, but at this stage it should be possible to write the tests – and so there will usually be some catch-up in testing. But do it as soon as possible because it will get worse if you leave it!

This entry was posted in programming for scientists. Bookmark the permalink.

6 Responses to Knowledge-limited, not time-limited

  1. Bill Hooker says:

    This sort of “what’s the quickest way to convince ourselves that a particular line of questioning is worth pursuing” is very common in biology, at least molecular biology (my trade). It usually takes the form of finding a fundamental falsifier — “if such-and-such, then there’s no point going down that path” — and a couple of quick-and-dirty experiments to get the answer.
    (BTW, I think your link to “recent comments” is somehow broken.)

  2. pm286 says:

    (1) Thanks Bill,
    I usually look to biology for useful ways to tackle informatics/software problems (chemistry is still in the last century – see links in http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=111).
    Thanks for the broken link – I am still fighting things in WordPress.

  3. pm286 says:

    (1) I have spent 20 minutes trying to fix the broken link. Although the comment has a URL (of the form http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=90#comments-296) whenever I enter this into the WP editor it trashes it. It trashes a lot of HTML (I also had great difficulty with alt tags, etc. where it “helfully” adds and deletes stuff almost at random. Pasting propert HTML doesn’t help – it can trash this as well.)

  4. Bill Hooker says:

    (2) Funny you should say that, I’ve been looking to chemistry — specifically, yourself and Jean-Claude Bradley — for ways to drag molecular biology into the Open Notebook Era. That’s the thing about the open science habit — once you catch it, you just keep looking for ways to co-operate instead of competing. I frankly think this methodology (or Weltanschauung, or ethos, or whatever you want to call it) is the future of science, and I know for a fact that I find it a lot more fun than the cut-throat alternative.
    (3) WP has helpfully made your typed-out url into a tagged link that goes to the post in question but not (at least in Firefox/MacOS10) directly to the comment in question.

  5. pm286 says:

    (4) Thanks Bill. Yes, things are not so black and white. I realised at the Science Commons meeting that the problems in chemistry can be found in most other disciplines. But I think they are worse and more concentrated in chemistry. For example very few chemists know about Pubchem – which is run by biologists. It’s inconceivable that ther reverse could be true – chemists had set up a biological resource that the bioscientits didn’t know about…
    I agree with all your sentiments. The great thing is that you can choose your collaborators – or rather the Web chooses them for you 🙂

  6. Bill,
    One of my main objectives in blogging and talking about Open Notebook Science and the UsefulChem project is precisely to find people like you who do understand the value of collaboration and shared learning. I am hoping to find a community of peers and co-mentors for my students (both grad and undergrad) who will comment directly in our shared wiki lab notebook. As a student I learned a lot from my advisor but also from postdocs and other faculty that were within my circle. I would like for my students to have their circle of mentorship as big as the world.

Leave a Reply

Your email address will not be published. Required fields are marked *