In a reply to a recent post Rich Apodaca made the point that Open Access (Open Data) will require business models:
Rich Apodaca Says:
April 28th, 2008 at 1:38 am e
By identifying and executing the right business model, the idea of control will become much less important. For example, you’ll find few complaints about Google essentially controlling the online search market; the vast majority of users are delighted to be able to search with the service whenever they want – and to have Google index their site.
This only happened because Google found the right business model and executed on it.
Maybe open access pricing and business models bring out nonproductive arguments because those putting them forward (and responding) are stuck in old patterns of thinking, or too heavily dependent on the current system. Scholars and publishers likely both share responsibility here.
My guess is that the open access scientific publication system that ends up working will start out by horrifying most of today’s scholars and being ridiculed or ignored by today’s publishers. But there will be a few niche groups for whom the truly disruptive open access innovation in scientific publishing will be a godsend.
Developing a workable open access business model starts by identifying who these groups are and how solving their problem can solve other important problems. It continues with finding a price and medium of exchange (perhaps not even money) that the market will find tolerable for awhile.
How can this issue be anything other than central to making open access work?
PMR: I agree generally with this – it’s often characterised by TANSTAAFL (“There Ain’t No Such Thing As A Free Lunch,“). I think most of the major innovators (certainly the funders) realise this – that’s why they are prepared to develop funder-pays approaches for Open Access.
Data is/are a particular problem. Data are more expensive than manuscripts. It’s virtually cost-free to download and read and copy and transmit a standard PDF or HTML, or any other document whose sole endpoint is to be read by humans. The creation of a reading human is, of course not, cost-free – the investment in the average human is large – but it’s not generally borne by higher education or scientific research (YMMV). But data are complex, and we are only at the start of learning what we can do with them. Open Data is not an end, but without it there is no beginning.
Data are normally produced for a particular purpose and the reuse them for another cost money. I’ll exemplify this by taking CrystalEye data – about 120,000 crystal structures and 1 million molecular fragments – which were aggregated, transformed and validated by Nick Day as part of his thesis. (BTW Nick is writing up – it’s a tribute to his work that CrystalEye runs without attention for months on end). The primary purpose of CrystalEye was to allow Nick to test the validity of QM calculations in high-throughput mode. It turned out that the collection might be useful so we have posted it as Open Data. To add to its value we have made it browsable by journal and article, searcahable by cell dimensions, searchable by chemical substructure and searchable by bond-length. This is a fair range of what the casual visitor might wish to have available. Andrew Walkingshaw has transformed it into RDF and built a SPARQL endpoint with the help of Talis. It has a Jmol applet and 2D diagrams, and links back to the papers. So there is a lot of functionality associated with it.
This has come under some criticism to the effect that we haven’t really made it Openly available. For example Antony Williams(Chemspider blog) writes (Acting as a Community Member to Help Open Access Authors and Publishers):
“This [interaction with MDPI] is contrary to some of my experiences with some other advocates of Open Data and Open Access where trying to get their “Open Data” is like pulling teeth.”
PMR: I assume this relates to CrystalEye – I don’t know of any other case. Antony and I have had several discussions about CrystalEye – basically he would like to import it into his database (which is completely acceptable) but it’s not in the format he wants (multi-entry files in MDL’s SDF format, whereas CrystalEye is in CML and RDF).
This type of problem arises everywhere in the data world. For example the problem of converting between map coordinates (especially in 3D) can be enormous. As Rich says, it costs money. There is generally no escape from the cost, but certain approaches such as using standards such as XML and RDF can dramatically lower the costs. Nevertheless there is a cost. Jim Downing made this investment by creating an Atom feed mechanism so that CrystalEeye couls be systematically downloaded but I don’t think Chemspider has used this.
The real point is that Chemspider wishes to use the data for a different purpose from which it was intended. That’s fine. But as Rich says it costs money. It’s unrealistic to expect we should carry out the conversion for a commercial company for free. We’d be happy to a mutually acceptable business proposition and it could probably be done by hiring a summer student.
I continue to stress that CrystalEye is completely Open. If you want it enough and can make the investment then all the mechanism are available. There’s a downloader and converters and they are all Open (though it may cost money to integrate them).
FWIW we are continuing to explore the ways in which CrystalEye is made available. We’re being funded by Microsoft as part of the OREChem project and the result of this could represent some of the way in which the Web technology is influencing scientific disciplines. We’d recommend that those interested in mashups and re-use in chemistry took a close look at RDF/SPARQL/CML/ORE as those are going to be standard in other fields.