xI’m gathering data for my presentation at OR08. Having appealed to the readership of this blog and found zero đ I’m now looking at other blogs. A very valuable post from Cameron Neylon …
From Neil [Saunder]My take on the problem is that biologists spend a lot of time generating, analysing and presenting data, but they donât spend much time thinking about the nature of their data. When people bring me data for analysis I ask questions such as: what kind of data is this? ASCII text? Binary images? Is it delimited? Can we use primary keys? Not surprisingly this is usually met with blank stares, followed by âwellâŚI ran a gelâŚâ.
Part of this is a language issue. Computer scientists and biologists actually mean something quite different when they refer to âdataâ. For a comp sci person data implies structure. For a biologist data is something that requires structure to be made comprehensible. So donât ask âwhat kind of data is this?â, ask âwhat kind of file are you generating?â. Most people donât even know what a primary key is, including me as demonstrated by my misuse of the term when talking about CAS numbers which lead to significant confusion.
I do believe that any experiment [CN – my emphasis] can be described in a structured fashion, if researchers can be convinced to think generically about their work, rather than about the specifics of their own experiments. All experiments share common features such as: (1) a date/time when they were performed; (2) an aim (âgenerate PCR productâ, ârun crystal screen for protein Xâ); (3) the use of protocols and instruments; (4) a result (correct size band on a gel, crystals in well plate A2). The only free-form part is the interpretation.
Here I disagree, but only at the level of detail. The results of any experiment can probably be structured after the event. But not all experiments can be clearly structured either in advance, or as they happen. Many can, and here Neilâs point is a good one, by making some slight changes in the way people think about their experiment much more structure can be captured. I have said before that the process of using our âunstructuredâ lab book system has made me think and plan my experiments more carefully. Nonetheless I still frequently go off piste, things happen. What started as an SDS-PAGE gel turns into something else (say a quick column on the FPLC).
[… and a good deal more…]
PMR: This is very important and I shall draw heavily on this and add my interpretation. Simply put, the whole idea of “putting data in repositories” is misguided. It is not addressing the needs of the scientific community (and I’m not going to expand ideas here because they are only half formed).
Cameron – I’d be grateful for any more thoughts on this issue – public or private. They will be attributed, of course. Your ideas will probably form the “front end” for the work that the Soton group has been doing so attribution will be important there.
Feeling guilty as a member of your readership who had not yet responded, I just wanted to check you were at least aware of the PrIMe project. The final aim goes beyond being a data repository, but a machine-readable repository of experiments is an essential part of it. They (we?) are working on schemas for recording various types of experiment and other types of data — a task I am beginning to realise the enormous complexities of. I can’t make it to OR08, but look forward to reading about it here!
Pingback: Science in the open » Responding to PM-R on the structured experiment
Maybe of some interest
http://peanutbutter.wordpress.com/2008/03/27/a-data-model-for-life-science-experiments-fuge/
Pingback: Science in the open » Data models for capturing and describing experiments - the discussion continues
Pingback: Science in the Open » Blog Archive » Data models for capturing and describing experiments – the discussion continues
Pingback: Science in the Open » Blog Archive » Responding to PM-R on the structured experiment