Bioclipse and the Information Revolution

I have been honoured to have been asked to talk at the 07.05.23 Embrace Workshop on Bioclipse 2007 (EWB 07), BMC, Uppsala meeting next week in Sweden. This post explains why Bioclipse is so important (it goes beyond bio/chem) and also provides a title and abstract of my talk. So first the facts – http://en.wikipedia.org/wiki/Bioclipse :

The Bioclipse project is a Java-based, open source, visual platform for chemo- and bioinformatics based on the Eclipse Rich Client Platform (RCP). Bioclipse uses, as any RCP application, a plugin architecture that inherits basic functionality and visual interfaces from Eclipse, such as help system, software updates, preferences, cross-platform deployment etc. Via these plugins, Bioclipse provides functionality for chemo- and bioinformatics, and extension points that easily can be extended by other, possibly proprietary, plugins to provide additional functionality.
The first stable release of Bioclipse includes a CDK plugin (bc_cdk) to provide a chemoinformatic backend, a Jmol plugin (bc_jmol) for 3D-visualization, and a BioJava plugin (bc_biojava) for sequence analysis. Bioclipse is develped as a collaboration between the Proteochemometric Group , Dept. of Pharmaceutical Biosciences, Uppsala University, Sweden, and the Research Group for Molecular Informatics at Cologne University Bioinformatics Center (CUBIC).

Bicoclipse is based on the enormously professional and influential Eclipse framework – developed by IBM and made Open Source. I use Eclipse every day for my software development. It contains a rich set of resources (editors, browsers, searchers) along with the management of key components (compilers, repositories (SVN/CVS)). But because the Eclipse framework is written so flexibly many of these can be “stripped out” and replaced with domain-specific components (for bio- and chem- applications). Not surprisingly many of the Blue Obelisk projects have produced components which are now part of, or pluggable-into, Bioclipse.
Over the last two weeks I have been heaviiy influenced by the vision of the “lowercase semantic web” and this will be an important aspect of my presentation:

“Bioclipse and the Information Revolution”

(Peter Murray-Rust,

Unilever Centre for Molecular Sciences Informatics, Deparment of Chemistry, Cambridge, CB2 1EW, UK)

Chemistry is a complex subject and its information management requires complex software. Traditionally this has been provided by groups (often commercial companies) which provide monolithic software systems, and by large information aggregators who compile, curate and redistribute products and services. In recent years innovation and value has slowed down, and much of the emphasis has been on integration within commercial customers (e.g. pharmaceutical) rather than the development of new functionality. In particular the academic community – on whose research the industry relies – is deprived of a software and information environment in which it can freely innovate.

By contrast the web has seen a recent explosion of innovation and wealth creation – often categorised by “Web 2.0” or “semantic web”. This is exemplified by the rise of the blogosphere (see Chemical blogspace) where many (young) scientists are trying new ways of communication and information re-use.

But the current web is based very heavily on text and graphics and has very little support for formalized disciplines such as chemistry. The browsers have little native support for XML (and what little there is can be found in vertical plugins, e.g. for mobile telephony). Much of the technology is based on centralised APIs such as Google maps, which has a centralist model and thin client model which does not translate to chemistry. And, if it did, it could consolidate the central control of information which many of us feel to be restrictive.

The current set of tools (Wikis, Blogs, etc.) are syntactically weak and (excluding a few experiments) have no semantic support. Current authors require semantic chemical tools, but are frustrated. Most rich chemical information rests on the laboratory bench – molecules, reactions, spectra, crystal structures, reports, recipes, etc. If this were made publicly available in semantic form chemistry could move towards a peer-to-peer network  that accurately represented the “ownership” or information.

The chemistry Open Source and Open Data community has now produced a critical mass of tools, many in wide use (post-beta) with more at alpha and beta. They have been brave in that they create components, often unglamorous but increasingly robust, which are interoperable and reconfigurable. They are increasingly being taken seriously, for example in pharma.

Until now the bench chemist – often trained on “clicks” within a Microsoft environment and ignorant of commandlines or scripting – would find there is too much integration required. But Bioclipse can and will change that. Any tech support in any institution will be familiar with “Eclipse” and can help with installation and integration and maybe even wrap some plugins.

The challenge for Bioclipse is to generate “viral” penetration within the chemical community. To do this it must:

  • be trivially installable. (I am currently installing V1.1.1).
  • be navigatable. Is the user interface – of perspectives – one that chemists can learn?
  • provide enough functionality to be useful.
  • require little or no maintenance.
  • ideally have a unique selling point (do something useful that other systems don’t)
  • interoperate with other systems (Bioclipse won’t be able to do everything)
  • create a semantically rich editor-browser platform (perhaps based on RDF)

This is a big challenge, but most of the Blue Obelisk and other Open Source community will be helping. (Bioclipse is Java, so non-Java applications such as OpenBabel and InChI require additional engineering). The areas where Bioclipse can take a lead include:

  • management of chemical documents (papers, theses, lab reports), using chemical linguistics such as OSCAR3
  • integration of structured ontologies such as GoldBook, ChEBI, CML dictionaries
  • validation of chemical information (using CML and other XML technologies documents and data sets can be formally validated)
  • integration of robots (e.g. harvesting of public chemical information)
  • integration into the chemical blogosphere (e.g. support for microformats and RDF).
  • linking of information within chemistry (e.g. analytical data to spectra)
  • linking between disciplines (e.g. small molecules to bioscience applications)

Given these, and given support for “most” of what chemists already require, Bioclipse should have immediate appeal. This will be strengthened by the needs and support of other communities such as

  • publishers (who need structured information that can be repurposed)
  • librarians (who need future-proof semantics for archival and preservation in institutional repositories)
  • regulators (who need searchable semantic information)

If it can spread virally, Bioclipse will be part of a Disruptive technology which will change the face of chemical information and effectively start the creation of the chemical semantic web.

This entry was posted in blueobelisk, open issues, programming for scientists, XML. Bookmark the permalink.

2 Responses to Bioclipse and the Information Revolution

  1. Pingback: business|bytes|genes|molecules

  2. Pingback: Twitter Trackbacks for Unilever Centre for Molecular Informatics, Cambridge - Bioclipse and the Information Revolution « petermr’s blog [cam.ac.uk] on Topsy.com

Leave a Reply

Your email address will not be published. Required fields are marked *