How many chemicals are mentioned in this paragraph?
“She had the drive to derive success in any venture and was well versed in Karate. When the man in the tartan shirt approached her with a dagger in his hand she spat in his face, took the stance of a commando and took advantage of his shock to release the dagger from his grip and causing him to recoil. He went home and took an aspirin after the beating.”
- the reactants were dissolved in pyridine
- nicotinic acid and other pyridine derivatives
- the signal from the protons in the pyridine ring were shifted upfield
“There was nothing on Tuesday”
“Nothing significant happened on Tuesday”“No mail arrived on Tuesday”“There was nothing significant on the TV, Tuesday” (US usage often omits “on” before a date”)“The day on which something was expected to happen was not Tuesday”
“The universe disappeared into a void on Tuesday”“Tuesday [Weld] had no clothes on”“The mob had no hold on [Ruby] Tuesday”
“She waited for his letter. There was nothing on Tuesday”
ne id=”o71″ surface=”imidazole” type=”CM” confidence=”0.9257968817491067″ SMILES=”c1c[nH]cn1″ InChI=”InChI=1/C3H4N2/c1-2-5-3-4-1/h1-3H,(H,4,5)/f/h4H” cmlRef=”cml9″ ontIDs=”CHEBI:16069″
ne id=”o101″ surface=”2H” type=”CM” confidence=”0.3341514144473448″ rightPunct=”,”
“Using a given corpus, previously annotated by experts, and with agreed guidelines for marking up chemicals, what compounds occur in the following paragraph with a probability of greater than x (e.g. 0.9)”
- comparison with English-language lexicon. If a word is also an English language word it is less likely to be a chemical.
- comparison with chemical lexicon, e.g. ChEBI. If it’s in there, its probability is increased
- part of speech. If it’s a noun it’s increased, if a verb it’s decreased
- lexical form. footyloxybarate is not a known chemical, but its lexical form makes it highly probable it is a fictitious one and not, say, a film star or pop group.
- Hearst patterns. “bioactive compounds such as aspirin or spat”. Even if not in a lexicon “spat” is probably a chemical rather than the past tense of spit.
- And usage (probabilistic). “take an aspirin” is a common phrase. “take a benzene” is of very low probability. So although “Dagger” (capitalised) is a trade name in Pubchem, I doubt there are any extant uses of “a dagger” as apposed to “some dagger”.
Peter has other clever tricks (and I suspect that there are some that are unique to our project).
- He. Unfortunately short strings (He, As, In, Be, etc and many abbreviations are difficult. OSCAR weights these down and the probability is low.
- aspirin – by lookup.
“She used her platinum card to buy a gold necklace, then crossed the iron bridge across the water as gold flecks decorated the sunset. Salt spray blew as she walked across the sand… “
PMR: [answer to O2: No, there is a telcom supplier in the UK with the trade name O2and it was full of telecomms gear. No oxygen except what comes from the air.