A few days ago I promise d to respond to Antony Williams’ post on associating chemical names with structures. I wrote:
There is no “right structure (sic)” for a compound. There are structures which have a very high probability of being associated with a name. There are names which have a probability of representing a chemical entity.”
I still hold by this statement and Antony’s post reinforces my view. I’ll post most of his and comment… [There is a question at the end that I'd like readers to comment on].
I refer you back to the original post
from which this comment was made as it is taken from a specific context.
Is this [PMR above] a true statement? In many case I would agree but I have my own opinion in specific cases and let’s focus on the drug industry for a moment and trade names. First, let’s talk about me..and my identifiers. Depending who’s talking about me I am Tony, Antony, Dr Williams, Mr Williams, Dad, sweetheart, son, Tone, AJ, Bro’ and so on. However I am registered with a social security number and exist as a legal entity, a “registered” entity.
PMR: Although humans are peripheral to this discussion, it’s actually very difficult to associate a human with identifiers. The UK is spending zillions of pounds on trying to do this and requiring everyone to have identity cards. They can be forged. They’ll probably need to brand us with a number, and have us rebranded every year in case we try to laser it off.
CS: Now, Zantac is a registered trade name for the chemical here.
PMR: This points to a page in Chemspider (http://www.chemspider.com/Chemical-Structure.571454.html) which I shall refer to as page571454 for simplicity of dialog. It contains the header:
CS: I am not an expert in the registration process but I believe that somewhere along the line a defined chemical entity is associated with that name. Whether the chemical entity has been appropriately elucidated by analytical technologies or not is a different question. What is registered as a compound, and associated with the name, is what that name defines.
PMR: I am not currently an expert in registration, but at one stage I worked closely with authorities such as FDA and WHO on registration of drugs so my comments may be out of date. I and colleagues also worked for several years on the structire of “ranitidine” – I’ll clarify later
CS: Now, there are a whole series of other names for the same compound – registry numbers, systematic names, organization numbers. See below
PMR: I will leave these here, and also add some from some from page571454 :
1,1-Ethenediamine, N-[2-[[[5-[(dimethylamino)methyl]-2-furanyl]methyl]thio]ethyl]-N’-methyl-2-nitro-, (Z)-
PMR: The first point is that these are NOT exact synonyms. It is clear that
are not identical. One describes a compound whose stereochemistry is asserted, the other describes one where the stereochemistry is not asserted. Butene and 1-butene and 2-butene and (Z)-butene are all different. They all have different InChIs. Some of them may refer to the same concept in some contexts, but they are not synonyms. Fowler (Modern English Usage) says “perfect synonyms are extremely rare”.
This is not nit-picking or logic chopping. If we are representing something in a machine, and we assert the two are to be used interchangeably then we have to be very sure that they can be. Adding a “(Z)” may appear a reasonable thing to do – in this case it is a diastrous act that corrupts information (I’ll leave that till the next post).
The robotic aggregation of chemical names and identifiers, if done without metadata and ontology, corrupts information. That’s a strong statement, but we can see it in the current case. First there is junk out there. Robotic name harvesting harvests junk. (Christoph Steinbeck described it in worse terms at the RSC meeting. ) Here’s a snip from page571454
Validated by Experts, Validated by Users, Non-Validated, Removed by Users, Redirected by Users, Redirect Approved by Experts
The “?” characters show up in my browser – I don’t know what they are, but they are not normal “e”s (ASCII 101). The first name is not a synonym – I’m sorry, but it’s junk. Associating junk with good information degrades the good information rather than increasing the quality of the junk (There is a more formal proof somewhere by Shannon – I believe – that machines cannot act as 100% proofreaders).
CS: I think that the Trade Name for a compound is definitive since its registered. Relative to the statement “There are structures which have a very high probability of being associated with a name. There are names which have a probability of representing a chemical entity.”…my question is whether a Registered Trade Name is absolute? I’m asking the question since I’m actually not sure. Thoughts anyone?
PMR: A trade name represents a product, not a compound and certainly not a connection table. In some cases it may refer to a pure substance, which itself is describable by a connection table, but these are not synonyms. And aggregating them as synonyms adds error rather than clarity.
However there is an even stronger reason why “Zantac” does not describe ranitidine. See the FDA page
Zantac (Ranitidine Hydrochloride) Tablets
Zantac contains (not “is”) ranitidine hydrochloride. Ranitidine is not ranitidine hydrochloride, any more than ammonia is ammonium chloride. Listing them toegther under synonyms corrupts information.
You may argue than an intelligent chemically educated chemist will know the difference and that may be true. But the current aggregations of chemicals (Chemspider, eMolecules, Chempedia) are designed for use by machines as well as humans.
And unless high-quality metadata is given, along with a structured ontology then machine aggregation of chemistry corrupts rather than enhances.
For that reason we are building molecular repositories based on metadata and ontologies. In the current era of the web it’s becoming essential.
Now, I suggested that the “(Z)” should not have been added to “ranitidine” to indicate the stereochemistry. You can find pages out there with “(E)”. What is the “correct structure”? Or is this a meaningless question?