In a recent post I said – rather crudely – that there was no absolute way of understanding chemical names. I have been (rightly) taken to task for imprecision:
ChemSpiderMan Says:
September 25th, 2007 at 5:04 am e I’m not sure what you mean by the comment “Because there is no absolute way of assigning names to structures.” Systematic naming is exactly that….IUPAC Naming, CAS Naming. Well defined rules. Now, are they exhaustive across all forms of chemistry..surely not…inorganics, organometallics, polymers while challenging do have nomenclature standards too while some believe they don’t. Of course chemical structure classes change…there were no rules of fullerenes before they were synthesized. But, in general there IS an absolute way of assigning the names to structures. Maybe I misinterpreted your
PMR: This is true, in principle for certain classes of compounds (mainly organic). BTW many chemical (informatics) folk are arrogant enough to assume that there is nothing in the world except organic chemistry. There are many chemicals which aren’t organic. The Wikipedians have a lot of problems in deciding how to assign a name to something because they use names as both descriptions and addresses. Naming is hard. Very hard. It’s been said that there are only two hard problems in computer science and naming is one of them. Here are some and they can’t be represented by a formal name other than lookup.
calcite / aragoniteBakelite
invert sugar
and, of course there are trivial names, such as Diazonamide A. Why use that rather than the systematic name? Because when it was first discover they didn’t know what it was. It seems they still don’t. Or at least some people don’t. The name relates not to a connection table but to a sample with associated properties such as composition, melting point, NMR, etc. which serve to identify, but not always elucidate.
Trivial names are convenient. Therefore we need an Open (not just free) set of chemical names.
I’ve just remembered. We’ve got several: Pubchem, Wikipedia, ChEBI. Set up respectively by biologists, volunteers, biologists. For the service of chemists. They might even get interested in helping them grow.
I think this is extremely important and gets muddled because we aren’t clear about what we are naming. There are two different things we need to talk about; 1) actual samples of things (PubChem calls this substance) and 2) idealized notions of a pure compound represented by a single chemical structure (PubChem calls these compounds). Lots of names (or other identifiers) can refer to either depending on context. So I can talk about methane and either mean the pure structure CH4 or what’s in the tank in the corner of the lab. Obviously for lots of things that distinction doesn’t make any real difference, otherwise, our usage wouldn’t have ended up that way. But for other things, and I think building chemical databases is one of them, I think it is crucial to be aware of this distinction. The Diazonamide A case is an example. Suppose we have the ability to harvest chemical structures for all the literature. The structure of Diazonamide A in the original paper can certainly be converted to an InChi, as can the the structures in the papers published later, but they won’t be the same. If we think of Diazonamide A as a compound it makes sense to have a single value for its InChi and it also makes sense to update that value as more experiments are done to clarify what that structure is. But when that structure is fixed, will we accept the fact that a InChi search will no longer bring up the original paper? Or do we “fix” the structure in the original paper? The reality is that the substance Diazonamide A has been reported in the literature with different structures and any database and/or searching architecture that can’t account for that fact is bound to have big problems.
DrZZ brings up great points and ones we have to consider moving forward. I had a long and deep conversation today about what is a “correct” structure. There are two hexacyclinol structures for example…that from the critiqued work and that from the later work.
http://www.chemspider.com/blog/?p=77
Both structures belong in a database and a link to the related publications. However, naming them both hexacyclinol is surely problematic. Only further annotation can resolve this I believe. This is an extreme case of complex products with different stereochemistry. Over time stereo can be cleaned up and there can be multiple structures. it is why I believe that we must use the connections first and then show all the different stereo-related structures. I think additional annotation is necessary to declare ONE of these structures as the “final structure” or “correct structure”.
Thoughts?
Pingback: ChemSpider Blog » Blog Archive » Will the Correct Structure of Taxol Please Stand Up. Part 3.
“Many” to me is a very ambiguous word. It can mean “a large number” and it can mean “more than 7 or so”. It’s not a word I like to use. Here’s an example. When you said:
you used “many” twice. The first “many” sounds like “a large number” of chemical informatics folk believe that, which I don’t think is the case. The very first day of Daylight summer school, where they talked about SMILES, Dave Weninger said ‘and here’s examples of what it doesn’t handle’. It’s well known that SMILES does not do a good job with ferrocenes and other organometallic systems, with mixtures, with crystal structures, and with ionic systems, among others. Because of the limited number of ring closures, SMILES can’t even handle some compounds which are perfectly expressible with the valence bond model used by SMILES! (I’ll leave that one for your puzzle category 🙂
While there may be many=”more than 7 or so” that believe that the only real chemistry is that expressible by SMILES, it’s a small minority, and probably limited to the same people who only work with and deal with only those sorts of organic systems. That’s not arrogance, that’s ignorance, or apathy, or prejudical dismissiveness.
Yet your second “many” means “a huge number”, indeed, in this case uncountably many chemicals which aren’t expressible in SMILES. It’s a completely different meaning of “many”. When juxtaposed like this it reads that you think that a large number – thousands, and probably a majority – of chemical informatics folk arrogantly believe that the only chemistry is organic chemistry.
That’s why I don’t like the word “many.”
Pingback: ChemSpider Blog » Blog Archive » Will the Correct Structure of Taxol Please Stand Up. Part 3.