21:28 18/09/2007, Science Library PadBut this press release I ran across from June amused me anyway.
“Science Library Pad was the subject of much speculation when analysts at several firms were heard to be very positive about its recent performance. Its share price rose from B$176.83 to B$245.80. Much of the hype was said to originate from M Melville whose AACR2 (artefact) was said to be involved….”
Historically, centralized data repositories like the NCBI, EBI, PDB, etc have been sources of data, but have also provided the most commonly used search interfaces and web services that people use to access that data. A number of services built on local copies of the data have been developed, often for internal use at companies (and I’ve been part of some fantastic ones), and while APIs are available, the trend to provide documented, usable APIs pervasive in the tech world these days is not quite the norm in the life sciences. Assuming that we have excellent public data repositories, with rich APIs and data structures, it would be nice if a mix of application developers, designers and data geeks could start developing visual experiences and web services that enhance the utility of these sites. Unfortunately, as Neil’s and Hari’s experiences have shown, that is simply not the case.
In my own experience, from conferences, etc, it is clear that the world of bioinformatics (all life science informatics actually) faces a major problem. One where too much time is spent moving data back and forth and in formatting/reformatting and just in work that I would call “grunt work”. A decade ago that might have been somewhat acceptable, as the field was still young, but not when bioinformatics becomes a core part of research. It is critical that various biological resources need to do a better job of allowing their customers (and I use the word deliberately) to be more effective using their resources. One of the best comments about Pipeline Pilot came from the head of informatics at a pharma company. He said that using it had made it possible for his informaticians to focus on developing new methods and deploying them to other scientists, since Pipeline Pilot did such a good job of gluing things together. We need to make this process even more simple, and allow the Neil’s of the world to focus on data analysis, software development and methodology and not data munging.
Let me take this thought one step further. I believe that there is a business model to be explored here as well. Philosophically, I believe that knowledge lies in what can be done with data, rather than the data itself. If everyone has equal access to the data, monetizing processes that generate useful information from the data is perfectly fair and square. The one caveat, and perhaps someone can share their thoughts on this, is whether the data producers should be compensated somehow, or is that addressed by the funding, etc they get? Alternatively, data produces are well placed to develop services on top of the data as they have intimate scientific knowledge. And I am not just talking about the AJAX-ification of genome browsers. It is a well known fact that Google and others have built their empire on top of open source software. Others have leveraged services and APIs to provide useful services, e.g. Lijit uses Google Custom search and one of the genome browsers mentioned above uses the Google maps API. Would it be appropriate to take publicly available services, and using them as a backend, develop commercial services? If yes, what are the kinds of businesses that can be built on top of that? What kind of licensing policies would be prevalent? Food for thought and the subject of another post some day.
PMR: I like this sentence:
“If everyone has equal access to the data, monetizing processes that generate useful information from the data is perfectly fair and square. ”
Yes. C21 should be about increasing real value with new products and services, not paying to get grotty C20 data out of jail. The last 10 years have been a total failure for scholarly eInformation. We have gone backwards. The dream of eScience is in ruins in many disciplines and lack of progress is not just zero, it’s negative.
So I want to pursue this and am actively thinking of ways to monetize Open Data. I’d like to hear from others who share the vision. That woud open up huge markets in which true competition, not robber barons, would flourish.
So, here’s my wacky idea. The biggest industry in 10 years will be “saving the planet”. It’s already worth 30 billion USD in carbon trading (which to me appears to be fantasy, but people are making lots of money from it). WP suggests it will be 1000 billion USD in a few years. So if we take the axiom:
“Open data and collaborative scholarship are necessary conditions to save the planet”
we could argue that monetizing the process was essential. Given that the EU already has an economic model where farmers are paid not to grow crops but to preserve the countryside, we could argue that publishers might be paid not to ban people from reading “their property”. This would then create a lively market in doing something useful with the data. If the publishers wanted to be in this market they would need to actually do something NEW, or someone will eat their lunch.
Tell me that this is not a fantasy