Journal of Cheminformatics special issue: Visions of a Semantic (Molecular) Future

Over the past approximately 3 months I and colleagues have been writing and editing 15 articles for the Journal of Cheminformatics on “Visions of a Semantic (Molecular) Future”. We’ve finally got to the stage where all 15 articles have been accepted and are in the final stages of processing. We expect the “issue” to appear RSN (“Real Soon Now”).

Most of the submitted drafts can be found here:

http://www.dspace.cam.ac.uk/handle/1810/238409/browse?type=title&sort_by=1&order=ASC&rpp=20&etal=-1&null=&offset=20

(Note that DSpace is poorly designed for managing collections of documents so we haven’t been able to provide our own title page which links to the articles and explains them – for that you have to know the “handles” and manually edit them. So there are also additional materials).

I wrote an editorial which can be found in full here http://www.dspace.cam.ac.uk/handle/1810/238399 and I’ll quote some sections. Note that several of the papers are general and the chemistry is almost incidental.

I’d like to thank the editorial staff of Biomed Central very much. This isn’t ritual thanks – many publishers generally deserve few thanks for their attitudes towards holding scholarship in the dark ages for their own benefit rather than serving readers and authors. It’s also no thanks to Springer (who own BMC) (see /pmr/2010/11/11/versitaspringer-%E2%80%93-please-edit-our-commercial-journals-for-free-so-we-can-sell-them-to-you/ and whose interview with Richard Poynder http://poynder.blogspot.com/2011/01/interview-with-springers-derk-haank.html showed that Springer simply regards academia as a (guaranteed) source of income.

I have commented before on how important BMC has been in establishing the credibility of Gold Open Access: /pmr/2010/06/11/reclaiming-our-scholarship-tribute-to-vitek-tracz-and-bmc/

Vitek, Matt Cockerill and others have shown that a publisher aimed at providing a service to the community can make a viable income (including profit). That in itself is valuable. But BMC, probably more than any other OA publisher, has caught the spirit of OA and more generally Openness in that it has been active in developing new facets to Openness (such as the Open Data awards and the adoption of the Panton Principles). And I confidently expect to be working in collaboration with BMC in the future and reporting it on this blog.

Iain Hrynaszkiewicz, Jan Kuras, Bailey Fallon and the editors (Christoph Steinbeck and David Wild) have all helped to adopt new features in this issue. Dan Zaharevitz’s contribution is unusual – it’s a transcript of his talk which I think captures the historical aspects of cheminformatics far better than sentences with passive verbs. Henry Rzepa , and our Open Bibliography group, eat our own dogfood and the editors have accepted this (Elsevier totally destroyed my last attempt to publish in HTML).

But the conventional publication process is out-of-date. The reviews have been useful. They’ve caught batches of typos, and we have added sections in response. Some reflect the different slants on publication and the tension between the new and the conventional. There are probably still glitches.

It’s taken about 1-2 months to write the articles (some of the authors like writing, some do not!). And about 10 weeks for the papers to go through the review process (most were posted to DSpace on 2011-07-04). And a bit more before they appear in print. I have to give great thanks to Charlotte (Bolton) who acted as amanuensis (and also to EPSRC who provided Pathways to Impact funding for the symposium and publication process).

So the timescale is probably about as good as it gets. But because BMC is an OA publisher we’ve posted the manuscripts in DSpace and Google (and perhaps you, dear reader) have been reading them. So publication was effectively immediate.

What have we gained from the formal publication process? Undoubtedly there will be people who don’t read blogs who will read them because they are in J.Cheminf. They have a formal bibliographic entry in a way that blogs don’t (yet) – but that will change. They are better because of the review process.

But the main apparent value is that they are citable, and citable for establishing the personal merit of the authors. For me that’s irrelevant – for some of the authors it’s very important. But *why* is the publishing of papers still stressed in this fashion? A paper about OSCAR4 is far less use in practice that the material we provided for the launch (tutorials, examples, downloads, etc.). Open Bibliography will be judged by how well it supports Open Scholarship – not by a paper. Henry and me recounting the development of CML might well be better in a video. Dan Z is certainly better in video! We have to change and there are increasing indications that non-paper outputs will start to be valued.

So here are some snippets from my editorial:

The articles have a common theme of representing information in a semantic manner – i.e. being largely “understandable” by machine. This theme is common across science and many of the articles can and should be read by people outside the chemical sciences, including information scientists, librarians, etc. An emergent phenomenon of the last two decades is that information systems can grow without top-down directions. This is disruptive in that it empowers anyone with energy and web-skills, and is most powerful when exercised in communities of people with similar or complementary skills.

It is often possible to move very quickly, and in our hackfests (one was prepended to the symposium) we have shown that it is possible to prototype within a day or two. This creates a new generation of scientist-hackers (I use “hacker” as “A person who enjoys exploring the details of programmable systems and stretching their capabilities” [1]). Several of the authors in this issue would regard themselves as “hackers” and enjoy communicating through software and systems rather than written English. This stretches the boundaries of the possible but also creates tension where the mainstream world cannot react on a hacker timescale and with hacker ethics.

More generally many scientists and information professionals are increasingly frustrated with the conventional means of disseminating science. Most conventional publishers regard scientific articles as “their content” and a very recent article (2011-06-20) from the STM publishers [2] indicates that the publishers believe they have the right to determine how content is, or more often is not, used. As an example most forbid by default indexing, textmining, repurposing, even of factual data to which the scientist has a legitimate subscription. This has an entirely negative effect on information-driven science, preventing even the development of the technology.

Generally, therefore, there is a culture of bottom-up change (“web democracy”) which looks to the modern web and examples of empowerment. (There are also examples of disempowerment such as attacks on Net-neutrality, walled gardens, information monopolies, vendor lock-in, etc. and this contrast activates many in the modern informatics world). There are several articles, therefore, whose main theme is the access to Open information.


I now believe that in many cases it is unethical to restrict access to publicly funded science. Lessig, in his CERN talk (“Scientific Knowledge Should Not Be Reserved For Academic Elite” [3]), showed that it would cost 500 USD for him to read the top 10 papers relating to his child’s condition. These papers are effectively only available to academics in rich universities. A colleague recently told me he had spent a month researching the literature of his child’s condition (to critically effective purpose) and we agreed he could only do this because he was a professor at a University. That is one reason I support the Open Knowledge Foundation and its projects to define and obtain Open information (of which Open Bibliography [4] in this issue is typical).


Because of this, chemistry has almost no public ontologies, and we have a vicious circle. Without ontologies, authors cannot reasonably be expected to create semantic information, and without a clear need for semantic information, the community will not take on the considerable load of creating ontologies. Several of the articles argue that the creation of lightweight dictionaries and other semantic metadata is affordable by the community and I believe that if the communal will is present, then it would be possible through bodies such as IUPAC and others, to create a full semantic infrastructure for much of the current published chemistry.

The current legal and contractual restrictions on re-using chemical data are seriously holding chemistry behind other subjects. These articles in this issue are not the place for polemics but we hope that traditional creators of information resources in chemistry will now think carefully about the value of making their data fully Openly available. This will be a considerable act of faith, because it will need a change in business model. Some of those providers have been traditionally held in high esteem by the community and if they use that esteem they have the opportunity to change the practice of chemical informatics.


A major feature underlying all of the papers is to give an insight into the process of creating an information ecology. Some of them represent scientific discoveries (e.g. Rzepa) but most are concerned with building a coherent infrastructure usable by the community. It may be useful to liken this infrastructure to the development of instrumentation in many branches of science. Science depended on the microscope, the telescope, the spectrograph, the Geiger counter and many other types of instrumentation. There is sometimes a modern tendency to discount instrumentation and infrastructure as not being ‘proper science’. We hope that this issue will redress that balance


Several of the articles (CML [13], OSCAR [14], OPSIN, dictionaries [15], WWMM [16]) in this issue cover a decade of work. We hope this will be useful to scientists and scholars who wish to implement new ideas and to give them some idea of what works, and what, more commonly, does not work. Sometimes only the passage of time and persistence achieves some level of success. Again, the short-termism of many infrastructural projects militates against developing a good platform for the future


A number represent growing points whose development is highly unpredictable. These include the WWMM [16], where the vision of a distributed peer-to-peer knowledge resource has had to wait a decade until it could be implemented. The Quixote project is only months old but takes this vision and has already built an impressive prototype, which I expect to set the model for computationally-based knowledge repositories. These projects rely heavily on community, and this is most clearly shown in the Blue Obelisk movement [20] which aims to, and has largely succeeded in, creating an Open infrastructure for cheminformatics. A major motivation for this has been not just that software and data should be universally available but also that this is the only manner in which science can be reputably validated both by humans and machines. An example of the need for such validation is shown in Henry Rzepa’s article [21].


The relative stagnation of chemical informatics suggests that change is unlikely to happen from within chemistry. As progress occurs in other areas (retail, bioscience etc.) chemistry may be dragged into the semantic world regardless. If chemists wish to retain control over their own systems they will be wise to start investing in Open semantic environments, because otherwise the rest of the world will do it for them.

How can chemical informatics survive and prosper? I think the most likely model will be Open publishing, not just of texts but data and other resources, mandated and paid for by funders. Those publishers which are able to adopt an Open model rather than continuing to maintain their own walled gardens, will ultimately triumph, and probably more rapidly than we expect.

 

 


 

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *