Cameron Neylon has made a very useful comment on the Open Notebook philosophy which I can go along with:
Cameron Neylon Says:
October 26th, 2007 at 8:51 am eI’ve come in a bit late on this. I am with Jean-Claude and Bill Hooker I think. I would call this as it stands an ‘Open’ or ‘Public’ experiment rather than Open Notebook Science. This is not to say it is a bad thing. And the motivation for holding back a little on the data is a very good and reasonable one. There is also a grey area that Bill noted which is that obviously data is not made immediately available but that our approach is that it should be made available as rapidly as is practicable.
PMR: I also agree with this – so all protagonists including myself are agreed that what we are doing is not Open Notebook Science. I reiterate that it is our intention to make the protocols and data open as soon as practicable. Some of this is simply technical – we would like to write to wikis automatically but haven’t got the technology in place yet. By daily postings we are effectively making our evolving protocol Open.
I see the slogan ‘No insider information’ as a goal to work towards rather than necessarily achievable. It is a challenging one but it is what we aim for. We are working towards getting our analytical instruments to autopost to our blog so if I can make an analogy here. If a student of mine puts on an analysis overnight the results ideally would be published directly to the blog as they come off the instrument. It is possible that someone in Australia (or California) would see these, notice that we have discovered a new enzyme activity/new drg target inhibitor and then claim the observation.
We explicitly take this risk. In particular for some of the large facility experiments I am planning I will put up raw and partially processed data that it will take me some months to get through the analysis of – someone else may beat me to it. But if we think this through. They could claim the discovery (and to do so would have to do it rapidly – via a blog/wiki). They would have to refer to the dataset (because they won’t have the equivalent dataset) and so they would have to make the observation public in non-peer reviewed form. For the deliberate spoiler I think you can argue that there would be a rapid and very negative public response.
PMR: I agree that this is true for large public datasets that cannot be replicated. The particle physicists have a very carefully worked out protocol for when data can be released and who can work on it and who gets credit. So do some of the astronomers and geospatial communities. But chemistry has no tradition of releasing data (and much tradition of not releasing it) so we are encountering birth pains.
Two cases where there is potential difficulty. Someone being ‘helpful’ by making an observation that I would have made (basically the obvious conventional data analysis). This means you feel obliged to give credit. I would say this is still fine to include as a students work in a thesis but would feel obliged to give credit (authorship) in a publication. But there is clearly a very large grey area here. We want people to find things we’ve missed – this is part of the reason we are doing this. And there are many cases where someone sees something that is obvious in hindsight but it is very difficult to pin down whether you would have seen it unless you were looking.
PMR: Agreed. This is possible for CrystalEye – anyone is able to inspect our histograms of bond lengths and come up with stuff that we and others have missed. We really hope they do so
The second difficult area is when do you feel that data is ‘fair game’ for re-use. If I leave a piece of interesting data on the blog for six months and make no comment and publish no paper does this mean someone else can have a go and feel free to go with it, perhaps publish independently? 12 months? 18 months? I think there is a need to develop or evolve some sort of code of good practise here. We don’t want people having to ask permission every time before playing with our data – but we want them to play nicely giving due credit where appropriate. Perhaps we should tag datasets as ‘I’m done here – feel free to go at it’ or ‘Anybody got any ideas?’. I will try to post on this if I can find some time over the next few days.
PMR: These are useful suggestions. I would certainly intend that there was a “fair re-use” moment. ONS says that is at the moment of conception. The Protein Data Bank says it is at the moment of public release of the dataset which may be months after deposition – it takes time to go through the system and there are some embargoes (usually not more than 6 months).
Part of the problems of the current exercise – and why it isn’t immediately suitable for ONS – is that it would be possible for someone to replicate the whole work in a day and submit it for publication (on the same day) and ostensibly legitimately claim that they had done this independently. They might, of course use a slightly different data set, and slightly different tweaks. The other factor is that data in NMR seem to be so valuable – there are still daily comments on this blog from one group attacking another group (independently of us) – that it is difficult to be objective