Open Notebook Science and Glueware

Cameron laments the difficulty of creating an Open Notebook system when there is a lot of data:

 

The problem with data…


Our laboratory blog system has been doing a reasonable job of handling protocols and simple pieces of analysis thus far. While more automation in the posting would be a big benefit, this is more a mechanical issue than a fundamental problem. To re-cap our system is that every “item” has its own post. Until now these items have been samples, or materials. The items are linked by posts that describe procedures. This system provides a crude kind of triple; Sample X was generated using Procedure A from Material Z. Where we have some analytical data, like a gel, it was generally enough to drop that in at the bottom of the procedure post. I blithely assumed that when we had more complicated data, that might for instance need re-processing, we could treat it the same way as a product or sample.

[snip…]

 

PMR: How I sympathize! We had a closely related problem with Nick Day’s protocol for NMR calculations. There were also other reasons why we didn’t do complete Open Notebook, but even if we had wanted we couldn’t. Because the whole submissions and calculation process is such horrendous glueware. It’s difficult enough keeping it under control yourself, let alone exposing the spaghetti to others. So, until the protocol has stabilised (and that’s hard when it’s perpetual beta), it’s very hard to do ONS.

 

And what happens when you change the protocol? The data formats suddenly change. And that will foul all your possible collaborators. Do you have a duty of care to support any random visitor who wants to use your data – I have to argue “no” at this stage. You may expose what you have but it’s a mess.

 

The only viable solution is to create a workflow – and to tee the output. But as Carole Goble told us at DCC – worklfows are HARD. That’s why glueware is so messy – if we had cracked the workflow problem we would have eliminated glueware.

 

The good news is that IF we crack it for a problem, then it should be much much easier to archive, preserve and re-use the output of ONS.

 

This entry was posted in open notebook science and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *