5 Years of Open Babel

I’ve mentioned Geoff Hutchison and Open Babel here before in the context of the Blue Obelisk awards. Open Babel is an Open Source “universal adapter” (see below). So it’s nice to report his announcement of 5 Years of Open Babel from the mailing list. To quote:

I’d like to take the opportunity to outline a bit of what’s happening with Open Babel right now and what 2007 might bring. Last year, we released version 2.0, representing a full stable release. Since then, we’ve released two updates to fix bugs, thanks in part to many user reports and contributions. There are contributed binary copies for Windows, Mac OS X, and a range of Linux distributions.

So what is Open Babel. Basically it’s one of those universal adapters such as you get in airports.
300px-mains_plug_travel_adaptor.jpg
(Thanks to Wikipedia – it’s so liberating to be able to paste pictures without worrying about copyright). This adapter can take 1 input (US) and transform to 2 different outputs (UK and European). Note that this transforms the mechanical format, not the voltage. (it’s a bit similar to transforming the syntax, but not the semantics). There are smarter adapters – some can manage 4 inputs and 4 outputs. But they feel as if they may fall to bits any time.
Open Babel is a lot more powerful than that! It can manage 70 formats. And also carry out some semantic conversion. It’s not a Swiss army knife – CDK and JOELib are more like that. It does a single job – syntactic and semantic conversion – and it does it through Open voluntary labour.
It is critical to highlight how important Open Babel and other Blue Obelisk activities are to the pharma industry. I’ve highlighted this before. I expect that Open Babel is used by every pharma company in the world. I estimate that in direct costs, staff etc. the pharma industry spends several billion (that’s a US billion = 10^9) USD per year on chemical informatics of some sort. (For example we can guess the amount spent on CAS, Beilstein, and chemoinformatics software). None of this goes on Open Babel.
That’s not quite true. Last year we had support for a summer student from Merck (Nick England) who added some exciting routines to Open Babel. Note directly costed – let’s say ca. 5000 USD. So thank you Merck. And also thanks to MDL for a summer student to write a CML Reader in Java.
But that’s about it. In IT there is a huge industry investment (direct and implied) in things like Apache, Eclipse,  etc. But in chemistry nothing. It’s pure free-riding by the pharma industry.
Now there is no moral argument here. We write these systems for a variety of motivations and they are fulfilling (though non-hackers have NO IDEA how much effort actually goes in.) NO IDEA. I have spent the weekend trying to refactor my molecule builder, and the gear wheels are spread out across the floor. Nothing is working. I promised my colleagues it would be ready for tomorrow. We’ll see. This post is a welcome relief.
So, dear pharma industry, if you read this – think about what you owe Geoff and the rest of us. It doesn’t just happen. It isn’t easy. The refactoring is desperate. We know you are shy – people in pharma don’t like to come out into the daylight so if you mail me I’ll keep it confidential. Or you can post an anonymous comment to the blog. I will have no idea who your are. At the very least add a post that says something like “Thank you Geoff from an anonymous person in pharma who has found Open Babel useful”. That sort of message is highly motivational.

This entry was posted in "virtual communities", general. Bookmark the permalink.

9 Responses to 5 Years of Open Babel

  1. Chris says:

    Actually OpenBabel does a lot more than file conversion. You can use it to do substructure and similarity searching, building a fastseach index allows for very, very fast searching. You can use it for property calculations, ring counts, rotate torsions and rigid overlay of structures onto a template and more…
    There have also been a couple of people contribute code recently, and it would be great if we saw more.
    An acknowledgement in publications would be nice also.

  2. pm286 says:

    (1) You are quite right, Chris. and my analogy doesn’t do justice to Open Babel. Of course it does substructure searches. It’s brilliant at it. We use it. I got caught up in the enthusiasm of writing something … authors are not always in control – the blogosphere is.

  3. Chris says:

    Actually Merck also make their in house force field available (MMFF), but your point is well made.
    Small contributions can have a significant impact, unfortunately many Pharma companies don’t seem to have a “mechanism” for this sort of contribution.

  4. pm286 says:

    (3) Any ideas for a believable scheme would be valuable here. I think we could benefit by branding the Blue Obelisk.

  5. JamesM says:

    A bit off topic, but I wonder if it wouldn’t be better, rather than saying OB’s not just for file conversion, to make a distinction between OpenBabel the program that does file conversion (mainly), and the C++ library behind it, and basically give the latter a different name, with OpenBabel as the flagship product. It seems a pity (and perhaps a losing battle) to fight against the clarity and enviable brand recognition that OpenBabel has as a file conversion utility program.
    Peter, I fear I am about to ask something dense, but when you say “there is no moral argument here” do you mean there is no moral reason for pharma to contribute to OB’s development, or that the moral argument is so overwhelming there’s little point in debating it?

  6. Chris says:

    A couple of thoughts:-
    If a Pharma company supports a grad student to do some lab work it is very likely they will get a publication out of the work, adding another file format to Openbabel would not be publishable.
    A publication is regarded much more highly than making source code available, citations count for career progression in industry as well as academia. (Should downloads count as citations?)
    Actually I’m not sure that helping open source software is that highly regarded in academia either, is there a perception Open source=Free= Of little value?

  7. pm286 says:

    (5a)
    A bit off topic, but I wonder if it wouldn’t be better, rather than saying OB’s not just for file conversion, to make a distinction between OpenBabel the program that does file conversion (mainly), and the C++ library behind it, and basically give the latter a different name, with OpenBabel as the flagship product. It seems a pity (and perhaps a losing battle) to fight against the clarity and enviable brand recognition that OpenBabel has as a file conversion utility program.
    Yes – I was far too glib in this. I was really trying to get OpenBabel across to non-chemists. I’ll revisit this.
    (5b)Peter, I fear I am about to ask something dense, but when you say “there is no moral argument here” do you mean there is no moral reason for pharma to contribute to OB’s development, or that the moral argument is so overwhelming there’s little point in debating it?
    It was the first (and probably badly put). I meant that I wasn’t appealing to the industry on the basis that if they used F/OSS they had a moral right to contribute. I am trying to promote a utilitarian argument – that there is a tragedy of some sort here – I have called it the tragedy of the lurkers (or free riders). If the industry put some effort into F/OSS this would be of direct economic benefit to them

  8. pm286 says:

    (7)
    If a Pharma company supports a grad student to do some lab work it is very likely they will get a publication out of the work, adding another file format to Openbabel would not be publishable.
    True in essence…
    A publication is regarded much more highly than making source code available, citations count for career progression in industry as well as academia. (Should downloads count as citations?)
    Actually I’m not sure that helping open source software is that highly regarded in academia either, is there a perception Open source=Free= Of little value?

    There is no useful metric for non-full-text academic output. I think this is changing (I will report on the Glasgow Digital Curation Conference). But the current full-text metrics are run as a business and while we delegate the award of merit to a multi-billion industry it will be difficult to redesign. I’ll also return to this. But yes, databases, software, etc. are usually regarded as second-class academic outputs.
    Should downloads count as citations? They don’t for fulltext (which is also an absurdity). Publishers won’t disclose how often a paper is downloaded read (that’s confidential business).

  9. JamesM,
    I don’t think it’s a ‘losing fight’ to tell people they can do more with Open Babel than just convert files. Yes, a lot of people know about babel as a file converter and clearly supporting so many formats benefits that aspect. But it’s just a marketing issue. If you take a look at the current Open Babel website, you’ll see “The Open Source Chemical Toolbox” and “ready to use programs” and “complete programmers toolbox.”
    I think there are a lot of people who need to write custom chemistry software (myself included). So you look around at toolkits and realize… oh, I already have that installed.
    Perception is important. I think our efforts to redesign our website are critical and I’ve already received positive feedback about this direction.

Leave a Reply

Your email address will not be published. Required fields are marked *