Open NMR – again. Why we do it

 Chemspider (who has been doing some useful things recently with making data available and on which I shall comment separately) criticizes our work on NMR prediction by GIAO methods, and says he doesn’t “get it”. So I will continue to try to explain.

A Publication Comparing NMR Prediction Approaches


Those of you frequenting this blog might have read my highly opinionated views of what was originally entitled “Open Notebook Science NMR” (1,2). My views around that work were very strong…in fact I didn’t really “get it”. I didn’t get why GIAO approaches for NMR prediction (with all of the stated limitations) would be done to prove that you could validate NMR assignments by comparing predictions with assignments made experimentally . It’s known that NMR prediction can validate structures – it’s done on a daily basis in commercial software tools. I was involved in building tools like that for over a decade so what was to prove?

PMR: This is a scientific experiement to see if Quantum mechanical methods can predict NMR shifts. The emphasis is on Quantum Mechanics. Does Quantum Mechanics agree with experimental data? At heart it’s as simple as that. Many results of quantum mechanics are not observable – the wavefunction for example. A few things are observable. The geometry. Aspects of the energy. And the interaction of the wavefunction with the nucleus.
The original GIAO method had deviations from experimental. Were these deviations due to experiment or problems with theory. Henry Rzepa thought the methodology needed improvement. So he created new basis sets. He also calibrated the effect due to spin-orbit coupling. Our research has confirmed that the spin orbit coupling effect exists and is of a reproducible magnitude. That wasn’t clear before.
As we continue to get better data we may or may not discover new effects. If so they may be discoveries in physics – who knows?
Physical science works in large maprt by comparing theories with experiment. This is the basis of PhysML about which I may write later. The Open NMR experiment is about comparing theory and experiment. It is NOT about predicting as many structures as rapdily as possible by empirical means. It is about the fundamental ability to predict the properties of matter through quantum mechanics.
We have done exactly the same for molecular geometry. It could be argued that rather than calculating the geometry of a crystal we should simply make it and measure it. We have, for example, showed that many current QM programs are not capable of calculating crystal structures well. That’s a deficiency of the theories and the programs. By highlighting the differences we help to develop better methodology in fundamental theory. That is an unarguable approach in science.
From the practical point of view there are huge numbers of molecules than cannot be well predicted by the empirical NN or HOSE methods. Transition metal compounds. Anything that cannot be represented by a connection table (no-one has responded to my request as to how NN or HOSE would calculate molecules such as Li4Me4). The view of chemistry seen through connection tables is necessarily limited. The view through QM is not.
There are also many chemical effects that can be investigated through QM. It is possible that there are clear and systematic effects due to solvation (e.g. on C=O groups). QM may be able to model these atomistically (i.e. with explicit solvent). NN cannot do this.  And there are many more aspects of chemistry where NMR shifts gives us a window on reality through QM calculations.
But first we have to get some believable Open Data to work with. Then we shall start to create new science.

Posted in open issues, open notebook science | 6 Comments

More on the state of OA in chemistry

Although I have already blogged Rebeeca Trager’s article it’s worth reading Peter Suber’s comments on it.  More on the state of OA in chemistry He uses facts and analysis precisely to take apart the various arguments of the publishers.

 

Posted in open issues | Leave a comment

Talks at Berlin5 on Open Access

 

Antonella De Robbio has very kindly made available the talks ate Berlin 5 Open Access : From Practice to Impact : Consequences of Knowledge Dissemination 19 – 21 September, 2007
They can be viewed starting from the Conference website or from http://cabtube.cab.unipd.it/conferenze/berlin5-open-access.
I am especially grateful since many of my talks involve demonstrations from the web and do not use Powerpoint. My own talk has acceptable audio but is a bit fuzzy on the slides. However I created several blog entries
berlin5 : Open Access to Research Data: surmountable challenges),
berlin5 : how to progress Open Data?
berlin5 : what did I say?)
which may help to fill in some gaps.
[Verbal slips – I referred to ACS’s description of NIH as “socialist”, when the exact term – as on the slide I showed – is “socialized science” [*] – my apologies. And I referred to Peter Suber’s categorization of Open access as “access barriers and Permission barriers” when the better term is “price barriers and permission barriers”]
It is always slightly scary to see what you actually said – particularly since I do not normally have a set order in my slides.
[*} Chambers derfines socialize/socialized as:
socialize or socialise verb (socialized, socializing) 1 intrans to meet with people on an informal, friendly basis. 2 intrans to mingle or circulate among guests at a party; to behave sociably. 3 to organize into societies or communities. 4 to make someone or something social.

Posted in berlin5, open issues | 2 Comments

Open Access – Chemistry World reviews the dilemma

In this month’s Chemistry World (a magazine from the Royal Society of Chemistry) there is an important article by Rebecca Trager (US) reviewing the increasing fission within the chemistry publishing community: Chemistry’s open access dilemma

 

This was a commissioned article, I think (Rebecca interviewed a number of people including me by phone) and does not, I think, represent any explicit or implicit policy of the RSC itself. I think the article gives a fair account of the current position in chemistry (the article is free-to-read and I give selected quotes):

But the saga [NIH bill] has highlighted a widening rift in the chemical community over open access publishing – and the contentious provision could yet be revived.

Major scholarly societies joined the Association of American Publishers (AAP) in lobbying against the proposal, including the American Chemical Society (ACS), the American Association for Clinical Chemistry, the Biochemical Society, and the RSC (publishers of Chemistry World).

PMR. I suspect, though I do not know, that this is distinct from the PRISM movement which was also launched from the AAP

But the battle lines are already being drawn. The ACS wants the NIH policy to remain voluntary. ‘Depending on how they implement this, it could represent a federal taking of copyrighted materials,’ ACS spokesman Glenn Ruskin told Chemistry World.

A compulsory policy would need costly monitoring and penalisation systems, Ruskin said. ‘Why expend monies on a mandatory policy, when they could get to their endpoint a lot quicker if they just worked more cooperatively with the publishers?’

‘The idea of public access to research information is a little bit specious,’ added Robert Parker, managing director of RSC publishing. ‘The UK government will be funding the London Olympics in 2012, but that doesn’t mean that everybody can have free tickets – there is a big difference between funding something and having it be freely available.’

PMR: Factually the current position is that almost all chemistry publishers (such as ACS and RSC) continue to hold the copyright on closed access articles funded by governments. Maybe the analogy with the Olympics is a little bit stretched.

The Partnership for Research Integrity in Science and Medicine (PRISM) argues that the Congress bill could damage peer review by compromising the viability of non-profit and commercial journals. Predictably, the campaign has sparked outrage among open access lobby groups. In the wake of the furore, nine publishers have disavowed PRISM, including Cambridge University Press, Oxford University Press, Columbia University Press and University of Chicago Press. The ACS – which had been closely involved with PRISM – has now also played down links with the campaign.

PMR: PRISM is playing Haydn’s farewell symphony. No one seems to support it (I don’t know about the RSC- maybe this is a chance for them to comment). Is anyone left?

As a result, the steps taken by the RSC and ACS to enter this new world of publishing have received a stilted response from chemists.

For roughly a year, the RSC has had an Open Science service that allows authors to pay to make their article freely accessible to all. The basic fee for a primary research article is £1600 with a 15 per cent discount for RSC members, owner societies of RSC journals, and authors from subscribing organisations. So far, just four authors have participated.

PMR: Just in case anyone is unfamiliar with the RSC’s use of “Open Science” – this is not full Open Access under the BBB declaration but is a free-to-read version where the journal retains copyright. Readers can decide whether this is a good bargain compared with full Open Access offerings (it’s not the worst).

Indeed, there are calls for bold and decisive leadership on this increasingly divisive issue from all sides of the chemistry community. ‘Vision is needed. Where we are at the moment is unacceptable,’ said the ACS’s Ruskin.

PMR: I have indeed argued frequently that bold and decisive leadership is necessary and that it should come from learned societies and International Unions who are respected by the community. But if it doesn’t come from there, the community will find another way and in the Internet era that can happen very quickly.

Posted in chemistry, open issues | Leave a comment

Data for common chemicals

As part of a project on chemical synthesis we need to collect some data on common chemicals. So what better place to start than with water? Before looking at the answers, see if you can find the

  • density of Water gas? (Wikipedia)
  • melting point of snow? (Pubchem)
  •  freezing point of water ice? (Wikipedia)

There’s a serious point to this. Much chemistry uses human language and words as a means of identifying concepts. (Thanks to Peter Corbett for (2))

Posted in fun | 1 Comment

Predictions for OA 2008

Peter Suber offers some OA predictions for 2008 in the December issue of the SPARC Open Access Newsletter. Peter is very clear-headed and I think they should be taken seriously. However if they come to pass I think we will have an even more complicated situation than now. Here are some relevant prediction:

 

(4) The US National Institutes of Health (NIH) will mandate OA for NIH-funded research. If the mandate doesn’t come as part of the NIH appropriation for fiscal 2008, then it will come another way.
When the dust settles and the OA mandate has been adopted, some publishers will sue to prevent it from taking effect. They won’t have strong legal arguments, but they will dress up what they do have and try to delay implementation as long as they can. After losing in the legislative and executive branches of government, a hard core of publishers who oppose government OA policies will keep fighting in the judicial branch. The Terminator may be reduced to a metal skeleton, but it will keep on coming.

(5) Publishers will always market their OA projects as boons to authors and readers, which is perfectly justified. But with or without more OA mandates to force the issue, we’ll start to see more OA and near-OA projects designed to help publishers themselves. These projects may not directly increase a publisher’s revenue, but they will prepare it to compete with free.

 

(7) We’ll see more publisher-university deals, like the Springer deal with Göttingen and the similar deal with the Universiteitsbibliotheken en de Koninklijke Bibliotheek. These deals create a new body of OA content –articles by faculty at participating institutions– for about the same price that institutions currently pay for subscriptions. They don’t make whole journals OA, and hence don’t make subscriptions unnecessary, but they do make articles OA. We’ll see more of them because they benefit both parties. They benefit universities by delivering more bang for the library budget buck and by widening the dissemination of some faculty work. They benefit publishers by reducing the risk of cancellation.

 

(8) We’ll see more funder-publisher deals, like the Wellcome Trust deal with Elsevier, the NIH deal with Elsevier, the deals of the Howard Hughes Medical Institute (HHMI) with Elsevier, Springer, and BioMed Central, and the Elsevier deals with most of the funders in the UKPMC Funders Group. Some of these deals pay publishers for gold OA when green OA would suffice, and some pay publishers for green OA when publishers don’t need to be paid at all. But we’ll see more of them because some funders are willing to pay to have the published edition of an article OA from birth (as opposed to the author’s manuscript OA after an embargo) and because many publishers are looking for ways to be paid for any concession to OA.

PMR: What I take from this is that most publishers will move some of the way towards OA and almost every publisher’s offering (apart from the out-and-out OA advocates such as PLoS, Hindawi and BMC) will be different. The situation for permission barriers will be complex and chaotic. We could end up with a situation where “most” offerings were “Open Access” (price barriers removed) but where it was almost impossible to determine what any publisher’s policy was on permission barriers (full BBB Open access). It’s currently a nightmare, to find out what you can and can’t do with many OA offerings – and it looks like it will get worse.

 

But maybe I’m too pessimistic. Maybe the funders will insist on absolute permission barrier removal. Maybe the publishers will decide that managing Open content could actually be a profitable business. Maybe…

Posted in open issues | Leave a comment

Bioclipse awarded [prize] at Trophees du Libre

Ola Spjuth reports that Bioclipse – the  collaborative bi/chem client based on Eclipse – has won another prize. Bioclipse awarded at Trophees du Libre

I [Ola] just arrived home from the international contest for free software, Trophees du Libre 2007, which was held in Soissons, France. Bioclipse was awarded the Special Prize of the jury, and the prize was handed over by the president of the Free Software Foundation Europe (FSFE), Georg Greeve, who also was the chairman of the jury. It was a great event; great to meet other open source developers and people representing organizations and companies who actively support free software. Apparently we received the Special Prize because we were too famous already :-).

 

Posted in open issues, programming for scientists | Leave a comment

author-doesn't-pay OA

Bill Hoooker has taken a limited vow of silence on his blog but some things make him break it. One is dis/misinformation about OA… If it won’t sink in, maybe we can pound it in…

Another brief un-hiatus, this one sparked by a question asked by Dave Munger at BPR3:

If you know of a peer-reviewed, open-access journal that does not charge a publication fee, let us know about it in the comments.

Practically every time I talk about OA, online or in meatspace, I hear “I’d like to support OA but I can’t afford it, don’t all those journals charge, like, $2500 per article?”No. They don’t.
Everyone seems to be thinking of PLoS, never mind that they waive their fees at the drop of a hat; the assumption that most OA journals charge (high) author-side fees is both widespread and completely wrong.
In fact, more than 2/3 of the journals listed in the Directory of Open Access Journals (DOAJ) and more than 80% of OA journals published by scholarly societies charge no author-side fees at all; in contrast, more than 75% of the 247 non-DOAJ journals in a 2005 survey do charge author-side fees (page charges, colour charges, reprint charges, etc) in addition to subscription charges.
… and a lot of carefully argued material showing that OA does not cost most authors.

Posted in open issues | Leave a comment

Scraping HTML

As we have mentioned earlier, we are looking at how experimental data can be extracted from web sources. There is a rough scale of feasibility:
RDF/XML > (legacy)CIF/JCAMP/SDF > HTML > PDF
I have been looking at several sites which produce chemical information (more later). One exposes SDF (a legacy ASCII file of molecular structures and data). The others all expose HTML. This is infinitely better than PDF, BUT…
I had not realised how awful it can be. The problems include:

  • encodings. If any characters outside the printing ANSI range (32-127) are used they will almost certainly cause problems. Few sites add an encoding and even if they do the interconversion is not necessarily trivial.
  • symbols. Many sites use “smart quotes” for quotes. These are outside the ANSI range and almost invariably cause problems. The author can be slightly forgiven since manu tools (including WordPress) convert to smart quotes (“”) automatically. Even worse is the use of “mdash” (—) for “minus” in numerical values. This can be transformed into a “?” or a block character or even lost. Dropping a minus sign can cause crashes and death. (We also find papers in Word where the numbers are in symbol font and get converted to whatever or deleted.)
  • non-HTML tags. Some tools make up their own tags (e.g. I found <startfornow>) and these can cause HTMLTidy to fail.
  • non-well-formed HTML. Although there are acceptable ways of doing this (e.g. “br” can miss out the end tag) there are many that are not interpretable. The use of <p> to separate paragraphs rather than contain them is very bad style.
  • javascript, php, etc. Hopefully it can be ignored. But often it can’t.
  • linear structure rather than groupings. Sections can be created with the “div” tag but many pages assume that a bold heading (h2) is the right way to declare a section. This may be obvious when humans read it, but it causes great problems for machines – it is difficult to know when something finishes.
  • variable markup. For a long-established web resource – even where pages are autogenerated – the markup tends to evolve and it may be difficult to find a single approach to understanding it.  This is also true of multi-author sites where there is no clear specification for the markup – Wikipedia is a good example of this.

As a result it is not usually possible to extract all the information from HTML pages and precision and recall both fall well short of 100%. The only real solution is to persuade people to create machine-friendly pages based on RSS, RDF, XML and related technology. This solves 90% of the above problems. That’s why we are looking very closely at Jim Downing’s approach of using Atom Publishing Protocol for web sites.

Posted in data, repositories, semanticWeb | 2 Comments

Survey of open chemistry in Chemistry World

Richard Noorden has written a balanced and informative view of Open Chemistry ( Surfing Web2O, Chemistry World, December 2007. )  He has read much of the chemistry blogosphere and talked with many of us on the phone. The article highlights the opportunities and the frustrations.  Here is a brief excerpt:

The rapid evolution of the world wide web is creating fresh opportunities – and challenges – for chemistry….

  • The internet is becoming flooded with free chemical information: from blogs to videos and databases
  • Linking this data together and interacting via the ‘social web’ could revolutionise the practice and teaching of chemistry
  • So-called ‘Open Chemistry’ faces many challenges: not least maintaining data quality and co-existing with trusted subscription databases…

PMR: I think we are beginning to see some movemen. The dam is built of sand and trickles are appearing. Some of us and encouraging this and at some stage it must burst.

 

We are going to need a new technology. Structured databases and portals will start to disappear and semi-structured collections of data (repositories) and people (collaboratories) will grow. There is a lot of interest from outside chemistry. Although chemistry per se is not interested in communal resources there is a big demand in bioscience and we shall get a strong “piggy-back” on the work happening there in text-mining, ontologies and semantic web. We’ll also see the push from repositories in academia and since chemistry is technically one of the easiest places to start, we expect to “leverage” this  [an unhappy verb].

 

Posted in data, open issues, open notebook science | 2 Comments