petermr's blog

Mon cher enfant, j

Posted on August 19, 2009 by pm286

Howard Flack is s crystallographer who has devoted many years to the crystallography of stereochemistry. Now he has published a fantastic review of one of the greatest feats of science – Pasteur’s work on optical isomers: http://crystal.flack.ch/sh5092.pdf

Louis Pasteur’s discovery of molecular chirality and spontaneous resolution in 1848, together with a complete review of his crystallographic and chemical work

I couldn’t even start to summarise this, but I would make it required reading for any young chemist. It’s almost impossible for us to recreate the mindset of chemistry 150 years ago and relating the geometry of crystals to life must have seemed fantastical. Weird ideas abounded and it would never have met the rigid and mechanical funding criteria of today. Not surprisingly Pasteur’s professor, Biot, was not convinced.

Howard quotes one of my favourite passages – translated and written by Frankland over 100 years ago. It shows the scepticism necessary to science, the care required to plan an experiment with appropriate control and the readiness of a great scientist to change their world view in the face of new ideas and evidence.

He (J.-B. Biot) sent for me to repeat before his eyes the several experiments. He gave me racemic acid which he had himself previously examined and found to be quite inactive to polarized light. I prepared from it in his presence the sodium ammonium double-salt, for which he also desired himself to provide soda and ammonia. The liquid was set aside for slow evaporation in one of the rooms of his own laboratory, and when 30–40 grams of crystals had separated he again summoned me to the Colle`ge de France, so that I might collect the dextro- and laevo-rotatory crystals before his eyes, and separate them according to their crystallographic character, asking me to repeat the statement that the crystals which I should place on his right hand would cause the deviation to the right, and the others to the left. This done, he said that he himself would do the rest. He prepared the carefully weighed solutions, and, at the moment when he was about to examine them in the polarimeter, he again called me into the laboratory. He first put the more interesting solution, which was to cause rotation to the left, into the apparatus. Without making a reading, but already at the first sight of the colour-tints presented by the two halves of the field in the Soleil saccharimeter, he recognized that there was a strong laevorotation. Then the illustrious old man, who was visibly moved, seized me by the hand, and said ‘Mon cher enfant, j’ai tant aime´ les sciences dans ma vie que cela me fait battre le coeur!’.

Posted in Uncategorized | 1 Comment

THE article: Do academic journals pose a threat to the advancement of science?

Posted on August 13, 2009 by pm286

Times Higher Education has run an article by Zoë Corbyn on “A threat to scientific communication” – subtitled: “Do academic journals pose a threat to the advancement of science?” She interviewed a number of people (including me) – on both sides of the fence – and I think it’s a balanced article. If you work for a publisher (or own a publishing house) you may think differently.

I still believe in a role for publishers and I know many people in publishing who are trying to do exciting things but the current position must change.

The article, which you should read, covers:

the impact factor
the monopolistic position of certain publishers
some barriers to innovation
a feeling that change has been far too slow and that change will happen anyway

My quoted remarks are probably no surprise to readers of this blog, but THE is widely read – at least in the UK – and my opinions were intended to reach the heads of Universities and spur them to action. Simply put, Universities get the publication process they deserve. They have the financial power to change it to suit the twenty-first century – they haven’t done so. They must.

Zoe puts the value of scholarly publishing at 3 B GBP == 5 B USD. It’s obviously a difficult figure to compute ans many publishers publish things other than journals (e.g. databases, handbooks, series, etc.). I guess the global academic research budget at ca 500 B USD. Cambridge Harvard, Stanford have research incomes of ca 500 M USD. Allow a power law and you get somewhere near that. As a rough check Wellcome will pay 1-2 percent of a grant for the cost of publishing a paper, which gives roughly the same ballpark. I’d be grateful for other figures. What’s the NIH spend? 30 B USD (http://www.nih.gov/about/budget.htm). Again use a power law and you get somewhere in that region.

The people who should jointly control this half-a-trillion USD are the funders and the researchers. So why does a metric system outside their control have such massive influence?

Universities have lost their Presses as major forces, their Libraries have no influence, so it has to be those who run the Universities to reclaim their standing.

The least they can do is read THE and start to address the problem.

Posted in Uncategorized | 3 Comments

The Pauling Blog

Posted on August 7, 2009 by pm286

I was delighted to get a request today to link to a blog about a very special person indeed, Linus Pauling:

The Pauling Blog is run by the Oregon State University Special Collections staff. It is devoted to informing the public of our various holdings, most notably the Linus and Ava Helen Pauling Papers. We post a minimum of twice weekly on Pauling’s life and work, often focusing on his contributions to the field of chemistry. We also post on his work in physics, biology, and medicine.

[…]

Please feel free to visit the Pauling Blog (http://paulingblog.wordpress.com) or contact us with any questions you may have. […] you may be especially interested in Linus Pauling: The Nature of the Chemical Bond, a documentary history website hosted on the OSU Special Collections homepage (http://osulibrary.oregonstate.edu/specialcollections/).

There can be no doubt that Pauling was the “chemist of the twentieth century” – he covered so many fields and was influential in all he touched. I had the privilege to meet him in 1984 (I think) and listen to him talk – in this case not about DNA or strontium or proteins but about minerals – the area where he started. In fact his thesis at Caltech was simply 5 papers in JACS on mineral crystal structures – which at that time were great intellectual feats. I was also presenting my own ideas on the automated analysis of crystal structures and he gave me interested attention.

I’ve blogged about him on a few occasions, but mention here

Impact Factors! Hirsch, Erdős and Pauling

where I suggest I suggest that after the success of the Erdős number in mathematics we could generate a Pauling number in chemistry.

And finally a personal connection – Catherine Murray-Rust was in the library at the time that the Pauling collection was being compiled.

This is an inspiration to us all.

Posted in Uncategorized | 2 Comments

Chem4Word: Semantics is a hard challenge

Posted on August 6, 2009 by pm286

This is a brief update… Although I have lots to communicate we have been spending most of our time working on Chem4Word and I don’t have time for blogging. We’ve (== Joe and bits of me, with Tim) frozen the API, and now we are fixing “bugs”. Although programming is over 50 years old (and I wrote my first program 45 years ago), bugs are universal. There are better bugkillers now, but there are bigger and more bugs. The greats of compSci have all recognised this, see: http://www.comp.nus.edu.sg/~damithch/pages/SE-quotes.htm

which should be required reading for anyone before they touch a keyboard. I’ll take just one:

Even the best planning is not so omniscient as to get it right the first time.
— Fred Brooks

Every experienced software developer knows this, and almost every software developer represses it. We are driven by optimism – in principle we should improve as we do projects and therefore we should do better and faster work. But, of course, we also increase our expectations. And the world expects more of us.

We didn’t get it “right first time”. We couldn’t. Because we are embarking on something new – this is not YACE – yet another chemical editor. This is a semantic chemical environment. And semantics are hard. Not impossible, but hard. And there is no way round.

Here’s a brief example. Many chemical editors have a button with a “+” sign (and another with a “–”). It’s meaning is “add a positive charge to the atom”. Sounds simple enough – CMLAtom has a integer “formalCharge” attribute – all we have to do is increement or decrement it. But what does it mean? This is where semantics (and CML attempts to be a semantic language) bites us. A semantically valid molecule in CML must “know” exactly what atoms and how many electrons it contains. What does “+” do to the electron count? Presumably it decreases it by one? Well sometimes it does, and sometimes it doesn’t. Because many editors are oriented towards organic chemistry where “+” can mean “add a proton to an atom” (a proton is H+) rather than remove electron (whihch might be signified by “.” (add radical). The “+” convention is so implicit that it’s universally understood, but never stated.

We’ve identified several different meanings of the “+” semiotics which depend on element identity and chemical environment. It’s so polymorphic and woolly that it proved impossible to write semantically consistent code. So we’ve had to redesign. We now have a button called “add H+”. This is not a common approach – I don’t know whether other tools use it. But for us it’s a logical and semantically clean approach. Is this a “bug”? It certainly fits Fred Brooks’ maxim. And have we got it right the second time? Until we get human chemistry feedback we won’t know.

So back to the unit tests. We can’t do it without them. Boring boring boring. But at least I can watch the TV as well – interesting program on Spanish ‘flu. And do about 6 tests an hour…

More blogging at some time.

Posted in Uncategorized | Leave a comment

Open Semantic Chemistry

Posted on July 28, 2009 by pm286

In a reply to my post on Chem4Word Egon makes a valuable contribution (Egon Willighagen says: July 27, 2009 at 5:37 pm)

I think the cheminformatics community is seeing the value of semantics in chemical editing, and understood that even closed-source product have shown serious evolution in this area. JChemPaint also followed the semantic path for a while, but does not have the advantage of tight integration in a production phase editing tool like Chem4Word has. With the current marketshare of Word, this editor will quickly see a quick uptake and bring semantic chemical editing to a new audience, that of organic chemists. This is positive, and anything drawn in this tool will be semantic and interoperate with other tools. That is positive too, even if many of us will not use the editor at all, like me.

I agree (although prediction of a “quick uptake” is an inexact science ). He is also right that he will not use the tool directly. However there are immediate spinoffs for the whole open chemistry community regardless of platform:

The system is modular. That means that it does not have to be used in Word (although obviously the benefits of creating a compound document will be absent). There is an essentially standalone tool allowing chemical manipulation of objects (relies on WPF/XAML and C#). There is also a library of routines (.NUMBO) which are independent of anything except the C# language. To what extent C# will be a help or a hindrance in the Open chemical world I don’t know.
The APIs have been designed to be largely platform and language independent. It’s difficult to write completely independent APIs (as for example CORBA IDLs) but the following signature is characteristic of the CID interface between the UI and the .NUMBO library:

public static bool CanFlipAboutExternalAcyclicBond(

ContextObject contextObject,

IEnumerable<XElement> atomPointers)

The contextObject holds the complete state in CML so that a generic library (such as JUMBO) can relatively easily implement them. That means, inter alia, that the system can be used for batch processing of data without the need for graphics

Many of the components are declarative (in various flavours of XML) and hence language-independent. Thus the primary CML validation in import is done using a CML XML Schema and a Schematron validator. This means that the process could be trivially ported to any other language or platform simply through standard XML APIs.

XML is platform independent (you do not have to worry about line-endings, blank space, etc.)

The CML-Lite schema has been thoroughly refactored and fairly well tested so that we have a good proven foundation for semantic chemistry

And, above all, it will be Open. That means that the community will be able to contribute and benefit.

How can people benefit and contribute if they do not use Microsoft technology? To the extent that the chemical architecture is language-independent we should be able to develop and refine the chemical algorithms and semantics independently of C#. At present we are hotly debating what is meant by “add a positive charge to an atom” – which I hinted at before. Think about the effect (i.e. what is the formula and electron count) of the following:

add a “+” to the N in (CH3)N
add a “+” to CH4
add a “–” to CH4
add a “–” to N=O
add a “–” to C6H6 (benzene)
add a “–” to Na
add a “–” to Na+
add a “–” to B in BH3
add a “–” to F in HF
Now consider what would happen if you had the option “add a radical” (often denoted by “.”).
I doubt very much whether the chemistry community agrees completely on the results, other than that it probably contains a “–” and/or “+” and/or “.” glyph somewhere. But if we do not know how many electrons there are, or what the spin multiplicity is, we cannot submit this to a QM calculation.
For this reason I think the Open Chemistry community (and especially the Blue Obelisk community) can help systemat
ize these declarative processes. My current position is that there are no universal valence rules and that there needs to be a separate set of rules for each element, each with its own special cases. I suspect that much of this is implicit, and perhaps explicit, in Openbabel, CDK, JUMBO, Avogadro and other Open software. If we can extract these into a set of rules that are declarative (i.e. not expressed in a specific procedural language) then we can start to get semantic consistency in our tools.
Here’s two more. What’s the result of deleting one =O atom from:
CH3C(=O)CH3
CH3S(=O)CH3
CH3N(-O.)CH3
CH3-N(=O)
CH3-N(=O)=O
and are there any general rules?

Posted in Uncategorized | 1 Comment

Junk Science? The blogosphere thinks so

Posted on July 28, 2009 by pm286

I was alerted last week by a blogospheric PhD student (worked with us for some time before going to Oxford) to the following story from Totally Synthetic (TotSynth).

NaH as an Oxidant – Liveblogging!

Even if you are not a scientist, please read on – it’s entertaining and informative. It deserves to be put in front of every young scientist as it shows the process of science as it should be done.

When I was at high school I read a popular and good chemistry paperback (Penguin) which highlighted the scientific method through a passage from Dorothy Sayers’ Strong poison where she describes in graphic and entertaining detail how A Marsh test for arsenic was carried out. The thread in the blogosphere captures competely the rigour, the attention to detail, the likelihood of false trails, the unexpected, the need for reference to authority and the need to question authority.

If I were teaching young chemists I would set them this as a real exercise. As a group, and in the lab. Give them a month. By the end of that month they would know far more about reactions, thermodynamics, spectra than they would get from formal lectures.

Moreover it highlights a real message of the evolving scientific web which is that what is said matters more than where it is published. For non-chemists I will interpret:

A group of scientists submitted a manuscript to The Journal Of The American Chemical Society. This is a well-known and high quality journal which is often used (naively) as a numeric metric of the value of a chemist (“how many JACS articles have they published?”). The ACS stresses the value of peer-review (as do I) and that its quality is low in Open Access journals (which I dispute). The published article (Reductive and Transition-Metal-Free: Oxidation of Secondary Alcohols by Sodium Hydride) is “advertised by the following graphical abstract (which I reproduce without permission as fair-use)

graphics1

The potential utilities of the simplest hydride reductant sodium hydride (NaH) as an oxidation promoter have long been overlooked.

This claim is sensational in that if goes completely against received chemical knowledge. Any first year student, if given the top (blue) reaction would be expected to draw the arrow in the OTHER direction (right to left). They would certainly fail (part of) an exam if they wrote what the authors have claimed. So it’s not an obscure finding. If true it would mean that (free) energy would have to come from an unknown source. Not impossible, but extremely unlikely. On the order of cold fusion or Benveniste’s homeopathic water.

The claim apparently went through the reviewers and editors with little comment. But the blogosphere picked it up and Totally Synthetic decided to question the finding. You must read the blog. There’s a blending of careful attention and excitement – what IS the answer?

So I’m not going to give away the punchline. But I will say that the peer reviewing is closed so I cannot absolutely comment on whether the paper should have been accepted. Currently I regard the paper as an outstanding example of junk science published in a journal which prides itself on selling high-quality science. But I haven’t read the paper (as it’s closed access and will cost me 30 GBP for 2 days only). So my mind always remains slightly open.

This should convince any sceptic that the blogosphere is an essential part of current science.

See also comments in RSC’s Chemistry World. It includes comment from Paul Docherty (Totally Synthetic):

I was alerted to the paper by readers of my blog, who noticed its controversial abstract almost as soon as it appeared online,’ says Docherty. ‘A quick inspection of Wang’s results astounded me, as he seemed to suggest that black was apparently now white; most curiously, his postulated mechanism only accounted for half of his results. Most provocative papers in organic chemistry take some time and resources to verify, but Wang’s chemistry seemed very amenable to a quick test reaction. It only took a few minutes to set up his chemistry in my fume-hood, and a similarly short amount of time to analyse the results. As I was writing about this on my blog, my readers did likewise, each using slightly different materials and conditions, allowing a very quick “scoping” of the chemistry.’

Some oxidation of alcohols was observed in most cases, but a consensus was rapidly reached that an oxidising contaminant was making its way into the reaction, be it oxygen from the air adsorbed to the NaH, or traces of sodium peroxide or hydroxide or some other trace contaminant. When stringent steps were taken to ensure absolutely that no air could enter the reaction system, no oxidation was seen.

Posted in Uncategorized | 10 Comments

Update including Chem4Word

Posted on July 27, 2009 by pm286

I have been “silent” for over two weeks – not because there was nothing to say but because we have been working very hard to get the first version of Chem4Word frozen. For Joe and me that means that when we get up in the early morning we think of nothing else and when we try to go to sleep it is whizzing round in our heads. This type of 100+ hour coding week can turn people into subhumans …

… But we’ve frozen the API and are technically in bug-fix mode. There are, of course, bugs to fix and we are tackling them. But we have our sights on releasing RSN (real soon now).

I should make it clear that Chem4Word will be Open Source. Everyone in the project is geared towards that. Microsoft is now starting to release considerable amounts of Open Source, and we are pushing hard to get the final legal clearance. I’m happy to discuss on this blog what Microsoft + Open Source means in a later blog post. I know there are readers who believe that Microsoft’s motto is “do only evil” – and I used to be close to that view. But Microsoft has changed, and so have I.

Our current strategy – and this may change – is to release as Open Source and to create a governance model that will allow managed Open development. There are lots of projects in software engineering such as Eclipse, Apache, etc. which have successful models. There are no such models in chemistry so we are in new territory. I’d welcome suggestions and offers.

I’ll be writing more about C4W but at present just a statement of some of the major bits

C4W consists of several modules, some of which are formally independent of Word.
The chemistry engine (based on CML and JUMBO, hence .NUMBO – “dotNUMBO”) is written in C#
The graphics and UI is based on WPF/XAML in C#
There is a stateless interface (CID) between the UI and .NUMBO which defines an abstraction of chemical commands
There is an import pipeline which enforces syntactically and semantically valid chemistry, thus avoiding the problem of not knowing what the chemical input actually represents.
There is considerable functionality (e.g. gallery, navigator) to interact with the Word document.

Chem4Word is a semantic editor – I suspect it’s the first for chemistry. Writing semantically correct code and documents is a hard discipline. Most current chemical tools require a sighted human to make judgements as to what something means, but this does not work in the era of the Semantic Web where machines must make accurate deductions. For example many tools allow the user to “add a + charge to an atom”, but what does this actually mean? Does it change the implicit hydrogen count? Or the spinMultiplicity? The answer is that it depends on the chemistry and there is no universal algorithm to do this. So C4W is built with a framework that allows semantics to be imposed by the chemistry.

In summary, we have got a toolset with significant novel functionality – even in places some limited “chemical intelligence”. When it’s released I will write blog posts explaining some of this.

Many thanks to the team – Joe, Tola, Tim, Alex, Lee, Jim+Jim.

Posted in Uncategorized | 1 Comment

Open Data is coming

Posted on July 12, 2009 by pm286

We (mainly Cameron Neylon and me) ran a session this morning on Open Data. These are un-sessions which need preparation but not a strict agenda. Certainly not a lecture. So we kicked off very briefly with the scene and moved to the Panton Principles on what scientists want to do in publishing data for the benefit of the community.

In very simple terms:

scientists want their data to be available to anyone and re-usable for any purpose without explicit permission.
The only requirement is that the source of the data be acknowledged.
Any further “constraints” are set by community norms in the particular domain. Those might involve human data, need for validation and data integrity, etc. Adherence might be a condition of funding. But they are set by the community, not by the author through a licence.

We’d anticipated that there would be some suggestions that commercial use could be forbidden. In fact there was none and we take great heart from this. We are all convinced that “non-commercial” restrictions (e.g. CC-NC) cause enormous problems. They propagate through the data chain. They are unclear (what is commercial – teaching? Books? It’s impossible to say).

People sometimes say “don’t you risk getting ripped off by someone who takes your Open Source code or Open Data and sells it?” The answer is emphatically NO. The whole of the Blue Obelisk will agree with this stance. To reiterate:

Someone can take my Open Source and incorporate it into a commercial program. I am quite prepared for this to happen. The condition is simply that they must acknowledge the source. They must not pass off the work as their own (I have had this happen and it made me very angry). But commercialisation is – in principle – a good development. It leads to a successful economy – we need the revenue streams. It may convince those who evaluate my work that it has additional merit (it may not, of course). Similarly is the data is valuable then products may be built on top of that. Again the developer must honour the source of the data. And in all cases there can be no backwards restrictions on the freedom of anyone to use the Open Source and Open Data in whatever directions they wish.

We got hung up a bit on “what is data?”. I think this will work itself out, so long as commercially interested parties are not allowed to draw the line. It’s critical that academia and funders and learned societies (limited to those without financial interests) evolve practices that create workable boundaries.

Of course it will become much easier when everything is Open Access. That’s my personal motivation-I spent too long today discussing with people about what is data, because they have to defend their business.

And a splendid surprise. Creative Commons were here and John Wilbanks joined us for lunch. John’s talking in London on 22^nd and coming to the Panton the next day. Watch this blog…

Posted in Uncategorized | 2 Comments

Scifoo and LambdaMOO

Posted on July 12, 2009 by pm286

Scifoo is magic. The first excitement is how many people you DON’T know. That means that you are going to be stretched in unimaginable directions. Then there are the people you do know – virtually – but have never met. Then the sessions which are often very direct and pragmatic while others are way out. We observe Chatham House here – no names, no opinions – but there’s enough we can talk about. Like the project to embed a poem in a archeobacter genome (yes, it makes sense). Are there life forms on earth based on other principles than DNA/RNA/proteins? That led to a fascinating explanation of 2-D electron gases at 30 millikelvin.

So one person I was delighted to catch up with was Pavel Curtis. Pavel is a visionary who influenced much of what I did during the 1990’s. Pavel worked at Xerox PARC and created the legendary LambdaMOO, a text-based virtual environment (MUD) based on OO programming, hence MOO. LambdaMOO was (is?) a vibrant community with often several hundred players and with the freedom to develop its own democracy or anarchy. For modern users of high-performance graphics games it may seem ridiculous that ASCII text can hold much power – but it can, in the same way as a bare stage and the spoken word can recreate fantasy lands.

I’ll deal with some of the sessions on a per blog basis, but among today’s were:

superblog. The leader was becoming successful in running his blog as a magazine, contrating writers and bringing increasing an significant advertising revenues. If it goes right you can earn sizeable amounts. But we discussed the many other reasons why we blog and it was interesting that two of the members were doing it as part of creative arts. We also noted how the traditional blog following of commenters was now dispersed over twitter and friendfeed.

Making ice cream with liquid nitrogen. Tasted great. Pictures later

The Enernet – an energy network based on Internet principles. Based on a Moore’s law like approach to energy (it will become exponentially cheaper – I came in late so I missed how this would happen).

Wolfram Alpha. Making progress (it seems to have corrected the bugs I reported some time ago)

The two sessions I’ll deal with in detail are Open Data and Wave.

Posted in Uncategorized | Leave a comment

Scifoo: Wave, Open Data and much more

Posted on July 11, 2009 by pm286

We’re in the heat of the Scifoo unconference at Google, hosted by Nature, O’Reilly and Google. It’s a fantastic experience – I’ve been fortunate to be re-invited. It’s about what you can help to create in this atmosphere – how can we change the world. We’re discouraged from real-time blogging and direct quotes without permission so this gives at atmosphere of trust and excited collaboration.

We all introduce ourselves in 3 words (and woe betide if you overrun) – Mine were OpenData, Chemistry and Hacking. We’ve spent time before we came contributing to a Wiki and bouncing ideas off each other. How can we create human 2.0? how does knitting relate to migraine? Anything.

Then we post possible sessions. These are topics people might be interested in joining. So Cameron and I have put up two – one on OpenData and one on Wave. Google’s rooms are either very large or quite small. So you have to guess whether your topic may attract people. Great fun.

I think I can reveal we had a presentation from Steph on Wave yesterday and there are lots of ideas on what we can do. (Wave is from Google in Sydney.) So we’ll have a developer there in the afternoon and see what happens. Wave is Java, XML, Python – with robots on the server and gadgets on the client. In true Aussie style the robots all end in -y – spelly is the spellchecker, rosy etta does translations in real time. So it looks a dead cert to translate OSCAR to OZ and become OZZY. (Except that you lot have now got 4 centuries for 5 wickets I can’t bear to watch any more). Anyway OSCAR can act as a robot which translates written chemistry into semantic chemistry. This is a great way to get programs out.

We also want to see what we can do in the client. Can Jmol run there? We’ll find out.

On Open Data this morning we’ll see who comes before we decide on the program. We can show the Panton principles, The OKF IsItOpen and also collect ideas of where open data works in Science – and where it doesn’t. When it doesn’t I expect the main problems to be:

restrictions by publishers, including universities

lack of a naming scheme

no examples of why this is so exciting.

But it will be foolish to guess what will happen. After all we are at Scifoo and it’s all about the future, even when we look backwards.

Posted in Uncategorized | 1 Comment

Mon cher enfant, j

THE article: Do academic journals pose a threat to the advancement of science?

The Pauling Blog

Chem4Word: Semantics is a hard challenge

Open Semantic Chemistry

Junk Science? The blogosphere thinks so

Update including Chem4Word

Open Data is coming

Scifoo and LambdaMOO

Scifoo: Wave, Open Data and much more

Recent Posts

Recent Comments

Archives

Categories

Meta