Peter Suber has blogged about an important discussion on Wiley's action is threatening legal action for reproducing a data graph from a publication. (there's quite a bit to read if you follow the links but it's worth it.) Also read the followups where several Open luminaries comment in a more equable
manner than I feel capable of at the moment.

PS:  The Batts/Wiley story broke in late April when I was traveling.  If I'd been at my desk, I'd have covered it or at least I'd have tried.  But because the comments proliferated explosively, I wasn't at my desk, and I had a full load of other work, I decided that I had to let it go.  I'm glad to catch up a bit with this post.  I'm also glad to have the chance to recommend comments by Mark Chu-Carroll, Cory Doctorow, Matt Hodgkinson, Bill Hooker, Rob Knop, Brock Read, Kaitlin Thaney, Bryan Vickery, and Alan Wexelblat.  Finally, Katherine Sharpe at ScienceBlogs, where the controversy began, solicited comments from five "experts and stakeholders" (Jan Velterop of Springer, John Wilbanks of Science Commons, Mark Patterson of PLoS, Matt Cockerill of BMC, and me [PeterS].)

The graph had 10 points. This, gentle readers, is Data. Numbers. Facts. Facts are Non-copyrightable. End of story. The author got round it by re-entering the data - well done - and absolutely correct - you cannot copyright numbers.

I have not seen the original graph but I cannot assume that the technical authors at Wiley had created a "creative work" or immense added artistic and cultural merit. There is a limit to what one can do with 10 data points. Perhaps they were going to hang it in Tate Modern. (Most publishers actually create "destructive works" on data - omissions, hamburgers, etc.).

We have to redeem our data - and quickly. There are several legal ways.

  • create supplementary data which we post on our web sites, in institutional repositories
  • just do it - as in this story. You have right on your side. Get your institution to back you. Make a fuss. Tell the world that the publishers are making it harder to save the planet. They are. We need data to save the planet. What if this were a graph (from a rival publisher) of the prediction of sea-level rise at Chichester (it's on the sea - that's where Wiley lives). Wouldn't Wiley wish to know when they would be flooded?
  • Extract data from the publication in numeric form and post it. It will be increasingly possible to do this at zero cost. We'll start explaining how in later blogs. And it will be legal.
4 Responses to Sued for 10 Data Points

  1. Peter...I am sure you have strong views on this question. QSAR WORLD is posting datasets...
    These datasets are chemical structures paired with one or more specific data points. Users of ChemSpider are asking whether or not there is an intention to add experimental data to the structures indexed on ChemSpider. The QSAR datasets are clearly one way to do this. So, the question I posit is whether this is appropriate?

  2. pm286 says:

    (1) I doubt I can help. AFAICS QSARWorld extracts data from closed publications manually - this is legal although rooted in 1980's thinking rather than 2007 and cannot be sustainable. Their metadata reflects many of my concerns that I have already blogged about - SD files are a very efficient way of separating data from metadata - "<ACTIVITY>" lists negative quantities - actually this is log(somethingUndefined) - there are no units given for numeric quantities. But this is common in "QSAR".

    I have no opinion on whether Users of Chemspider should be using this or not.

  3. >> Extract data from the publication in numeric form and post it. It will be increasingly possible to do this at zero cost. We’ll start explaining how in later blogs. And it will be legal.

    Can't wait to hear it!

