Building a Better ChEBI

Chemical Entitites of Biological Interest, ChEBI, is a freely available dictionary [1] of molecular entities, especially small chemical compounds. Like all big dictionaries and ontologies, it has its own unique challenges. Fortunately, those nice people at the EBI are holding a workshop to discuss future developments in ChEBI. In preparation for the workshop, here are some brief notes on how ChEBI could be made better. [Disclaimer: I’m fairly new to ChEBI and “thinking out loud” here, add comments below if I’ve said anything stupid or wrong]

ChEBI: Too much, too young?

Some dictionaries try to describe too much. When it comes to writing down knowledge, it isn’t always easy to know where to stop. To define scope, the BI in ChEBI stands for “Biological Interest”. So this begs the question, why does ChEBI describe all sorts of subatomic particles that are of little (or no) biological relevance? While electrons (ChEBI:10545) and protons (ChEBI:24636) play an important role in Biology, you have to wonder what the biological interest of neutrinos (ChEBI:36352) and bosons (ChEBI:36341) is. Who decides what is “biologically interesting” and how?
Then there is the inescapable legacy of IUPAC, which ChEBI aligns itself with closely, but unfortunately IUPAC is a bit dated and cumbersome (or so I’m told).

ChEBI: I just can’t get enough?

Some people are never happy. Take any dictionary or ontology and they will pick holes in it. “It doesn’t say this, it doesn’t say that, this is wrong” etc. In no particular order:

If I missed anything off the list, of things that are “wrong” with ChEBI, please let me know. If you’re going to the workshop, see you there (alongwith Christoph[e] Steinbeck and Kirill Degtyarenko I suppose)

PMR: I shan’t be at EBI but I think some of us will. Also Christoph (sic) visited us last week. ChEBI is an example of the sort of chemical ontology that should be commonplace already in mainstream chemistry but isn’t. Crystallography has its CIF dictionaries, Bioscience has GO and OBO, but chemistry? Yes, we have the IUPAC Gold Book which has been lovingly crafted into XML. But it’s not an ontology – it’s a rather random terminology whose structure is rather random entailment.
So – as always – the bioscientists are putting the chemists to shame and eating their lunch. Public Databases of chemical structure? Pubchem. Public Databases of chemical reactions? KEGG. Public Abstracts of chemical papers? Pubmed. And what do the chemists think? Most don’t even know they exist.
There is no doubt that the bioscientists are getting more interested in chemistry. They’ll start developing tools and databases. And if they happen on this side of the Atlantic – as ChEBI does – there is no way that lobbying Congress will get them closed down.
A bioscientist asked me today why she couldn’t get a free tool that translated IUPAC names into structure. IUPAC doesn’t have resources to do this sort of thing – it works by voluntary labour. Whereas commercial and quasi-commercial chemical databases trun over more than 1 billion per year. Bioscience funds the infrastructure of informatics. Chemistry seems hell-bent on staying in the dark ages.
I’ve got a shopping list of what I’d like to see chemical bioinformatics create. And I’d be happy to talk to those interested in getting them off the ground.

