Oh Dear … Patent on Name2Structure conversion

Chemspider has reported a new patent which claims the conversion of chemical names to structures. (BTW I am genuinely grateful for this post, as for several of the others). He writes:

Name to Structure Conversion – and What One Little Patent Might Do…
Those of you watching this blog will likely have seen multiple conversations by myself regarding the conversion of chemical names to chemical structures. There are a number of commercial products on the market performing this conversion including those of ACD/Labs, Cambridgesoft, OpenEye, Cheminnovation and ChemAxon (soon). There may be others.  Also, there are now efforts going on in academia.
Last week while searching for some information in the patent database I happened across an interesting patent.
The title and lead in is listed as:

Method, system, and software for deriving chemical structural information
A method and a system are provided for deriving chemical structures from chemical names. Chemical name fragments are grouped into a number of classifications. The method and the system handle new and old chemical names, including names for organic and inorganic substances.

My interpretation of this patent is that this is for the conversion of Chemical Names to Chemical Structures (I am not a patent lawyer though). The patent was granted to Jonathan Brecher of Cambridgesoft as listed here. The patent was granted in 2006.
As a product manager and CSO at ACD/Labs I managed the Name to Structure functionality in their nomenclature software. There was a LOT of prior art when this patent was applied for, in my opinion. Products might not have been on the market but certainly a number of companies had such capabilities. This will be interesting to watch….

PMR: This is very depressing. It’s a classic example of the tragedy of the anticommons – where over protection of IPR leads to nothing for anybody. I have not read all the patent (  Method, system, and software for deriving chemical structural information) , but here are some bits:

  • Uncommon characters of chemical significance are spelled out using common characters, so that, for example, the character “µ” (“µ”) is changed to “mu”.
  • Also during the preprocessing, if the name or a portion of the name has been submitted in inverted form (e.g., “acetic acid, 2-hydroxy-“), the name or portion is converted to its uninverted form (e.g., “2-hydroxyacetic acid”)
  • The input name is analyzed to mark all potential name fragment boundaries (step 2010). In a specific embodiment, the mark used is an @ sign, which is rarely used in chemical names. In another embodiment, it may be advantageous to use a non-printing character such as control-A (ASCII value 1) that has effectively no chemical significance.
  • The buffer is scanned for any single one of the characters “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”, “?”, or an apostrophe, that is immediately followed by any number (i.e., including zero) of the characters “]”, “)”, “}”, or “h”, in any order, but that is not preceded by the character “d”. If such a sequence is found, any @ sign that immediately follows the sequence is converted to a comma, so that, for example, “1h@3h@5h@2@4@6-pyrimidinetrione” is properly converted to “1h,3h,5h,2,4,6-pyrimidinetrione”.
  • The buffer is scanned for an @ sign immediately preceding any number of periods, where such periods (if any) precede either i) any single one of the characters “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”, “?”, “n”, “o”, “p”, “s”, “N”, “O”, “P”, or “S”; or ii) any of the text strings “ortho”, “meta”, or “para”. If such an @ sign is found that is preceded by any number of apostrophes or periods, which are preceded by any one of the strings “ortho”, “meta”, or “para”, the @ sign is converted to a comma.
  • the name is divided into the smallest number of meaningful fragments of a maximum length. For example, “pentane” is not divided into three fragments “penta”, “n”, and “e”, since the latter two fragments would not be meaningful, but rather is divided into two meaningful fragments “pent” and “ane”. In a specific embodiment, a fragment is determined to be meaningful (“recognized”) if an exact match for the fragment is found in a dictionary of known text strings (“lexicon”) that is maintained by the system.
  • The locant map associates names of individual atoms with respective specific locations in the connection table. For example, an atom named “2” in “2-hydroxy-propanoic acid” may be a specific one of the carbon atoms, and a “3” atom may be a different one of the carbon atoms. Multiple locants can refer to the same atom: “beta” may refer to the same atom as did “2” above.

PMR: Non-chemists may regard it as a non-obvious invention that “pentane” should be broken down into “pent”- and “ane”. Since this is what we teach our first-year students it is clearly non-obvious (or we wouldn’t need to teach them). But there is a tiny possibility that it is prior art. After all we (and others )have been teaching the students this for over 100 years.
Now I know that prior art doesn’t matter to the USPTO and you can patent almost anything. I suspect that if you proposed identifying a chemical by giving a set of characters called a n-a-m-e you’d be allowed to patent that.
I haven’t read the patent (it requires microfiche). I don’t know how Cambridgesoft’s name2structure software works but I suspect it is probably close to this patent. However I suppose a competent lawyer could claim that any name-2-structure software infringes this software. That means you have to go to court to defend it.
So what does it mean for OSCAR/OPSIN? Not the slightest idea. We haven’t taken any of Cambridgesoft’s tradesecrets and we don’t use @ signs to separate spaces. (I have to say I have used tildes (~) in the past and that is next to the @ key). But have regarded pentane as being split pent-ane and so does OSCAR, so we are clearly infringing. (OK we did this in 2003, but does that count?).
If you have a culture of patenting the obvious and fundamental then you destroy it.

This entry was posted in open issues. Bookmark the permalink.

2 Responses to Oh Dear … Patent on Name2Structure conversion

  1. Rich Apodaca says:

    You can view the patent with pat2pdf:
    http://www.pat2pdf.org/
    Just type in the US patent number (7054754) and you’ll get the full pdf. Works with any US patent or patent application. Other services do the same thing.

  2. “(OK we did this in 2003, but does that count?).” Yes, the patent was filed in 2000. Everything you did before that would be prior art, and if not mistaken, you would need to go to court to make the patent invalid, otherwise they can sue you for infringment. Now, the patent is US based… the big question is, did they file in EU-countries too, and when. US-patent law only applies to the… well, US.

Leave a Reply

Your email address will not be published. Required fields are marked *