Remixing Open Data and the cost of not doing so

Welcome to a new blog (Research Remix)  from Heather Piwowar, currently doing her PhD in Biomedical Informatics at the University of Pittsburgh. Heather is encountering first-hand the difficulty of doing her research because of the problem of getting access to data. So she’s taking a very systematic approach to analysing the problem. Here’s a typical post

Open Literature Review on Open Data

Don’t you love to experiment? Me too.
This blog is an experiment. I’m starting my PhD literature review on the topic of biomedical data sharing and reuse, and thought it would be appropriate to do it out in the open.
Not quite sure how it will work: I’m new to this blogging thing. Please send me suggestions, questions, and especially links to related work.
Thanks, and happy experimenting… with your own data or that of others 🙂

One of the key tools we must have in fighting for Open Data is agreed metrics. That is hard work. It includes much disappointment – in other posts Heather mentions that many researchers don’t reply to requests for data, and many of those that do cannot (or will not) supply it. (To be fair it’s often because it is a lot more work than it might seem – among the first customers for Repositories we often find scientists who have lost their own data!).
It’s also important to realise that this data has cost money. There seems to be an assumption that once the “science” has been published the data are then worthless. That’s usually not true, but even if it was I think it’s useful to enumerate the actual cost of collecting the data. A useful metric is to work out what they would cost at commercial rates – if a chemistry department generates (say) 500 crystal structures at a commercial cost of (say) USD 3000 (and that’s probably underestimate) – that’s 1.5 million dollars. Does it become worthless after publication?
So we need metrics. It’s not exciting, but it’s necessary. I would like to know how how many chemistry papers are available under “Open Access/Choice” or whatever name – where the author is invited to pay the publisher so that people can read the artcile Openly. And I am interested in the publishers’ poicies on Open Data – is supplemental data Openly available. This is a sizeable task. But with modern Web 2.0 tools it should be easier to aggregate the response (or non-response) from the publishers. Suggestions and offers welcome.

This entry was posted in data, open issues. Bookmark the permalink.

3 Responses to Remixing Open Data and the cost of not doing so

  1. Chris says:

    That is the intention.
    “Additional information such as experimental or spectroscopic data assisting the reader can, and will, be provided liberally and could of course be linked electronically to structures within the document.”

  2. pm286 says:

    (1) Chris – not quite sure what points in the post you are addressing. Certainly CC and other OA journals should (and in this case do) make all their data Open. There are technica; aspects to this which few publishers achieve at present. I am currently investigating the amount of destruction created by conversion to PDF.

  3. Chris says:

    Sorry it appeared cryptic 🙂
    It was supposed to be in response to your question “is supplemental data Openly available.”
    I guess it might depend on the “data” but I’d hope much should be available as plain text.

Leave a Reply

Your email address will not be published. Required fields are marked *