The ICEman cometh

First to thank Peter Sefton and family for looking after me so well. We’re just about to see the sights of Toowoomba – The garden city. I gave a talk yesterday about the Semantic web and other stuff and used the WP entry above to highlight the way that WP is almost magically capturing commonly agreed information. (Someone pointed out that the height was given wrongly so I urged them to fix the entry – all I know is that we are relatively high up!).
I gave my talk in HTML as normal – a series of ca. 100 major topic with 2-20 “slides” under  each. I select each “slide” as I go along and stop at the time limit. At least this means I never overrun. The system has evolved over the years and now has a vertical menu for each topic and a horizontal one within the topic.
Peter and colleagues have been working on educational technology for many years and his ICE is aimed at many things including slides and course content. Because it rigorously separates content from style it’s straightforward to repurpose material. Blogs, slides, courses, summaries – all can be derived from or into ICE.
So I shall leave Peter to blog about the details. (For a typical example, see Graphing with ICE). ICE has been carefully thought out with a balance between formal systems and pragmatics. It works with  normal systems susch as Word, PDF, etc. It seems ideal for theses but I’ll not give anything else away in this blog. The team is great – we picked some topics of mutual interest and within 24 hours we had proof of concept of two. (They involve chemsitry!) So we’ll be adopting and encouraging its use.

Posted in Uncategorized | Leave a comment

ICE: Learning materials and authoring tools and XML/CML

After several abortive attempts a correspondent asked on the CML Blog:

“How do you put in XML?”

I’m at one of the best places in the world to answer this (perhaps after a day more). I’ve travelled to the University of Southern Queensland in Toowoomba – a fantastic setting – but before you get too jealous it has rained most of the time. This is a good thing as QLers are on a severe drought restriction – nor car washing or gardens (I’ll find out whether cricket pitches are exempt).
My host is Peter Sefton and his group, Daniel de Byl, Ron Ward and Oliver Lucido in the ICE (The Integrated Content Environment) project of the LFII ( Learning Futures Innovation Institute). USQ is very strong in distance learning and therefore put much store on the quality of the learning material. For example they have developed tools whereby course materials can be repurposed as student notes, instructor resources, slides, etc. All though Open technology such as XML, RSS, CSS, Java etc.
Much of what they have done is of great value to semantic science. The group has already worked out how to put CML molecules into various compound documents in a variety of formats – Word, Open Office, PDF, HTML, etc. It’s hard work and this post records my appreciation for it
Peter runs his own blog and here’s the most recent post:

This site is now Zotero friendly, courtesy of the unAPI plugin for WordPress

If you’re using the Zotero research tool (and if you read this blog you probably should) you should now see the little icons to add posts to Zotero up in your Firefox address bar. Save metadata for your favourite posts! Cite me!

This will only work for the WordPress part of the site, post November 2007.
Reading the list of ingredients it seems like I am now metadata central:

The server provides records for each WordPress post and page in the following formats: OAI-Dublin Core (oai_dc), Metadata Object Description Schema (MODS), SRW-Dublin Core (srw_dc), MARCXML, and RSS. The specification makes use of LINK tag auto-discovery and an unendorsed microformat for machine-readable metadata records.
http://lackoftalent.org/michael/blog/unapi-wordpress-plug-in/

This is courtesy of the unAPI plugin, found via the Zotero site.
Off the top of my head I can think of a few other WordPress sites I’d like to have this so I can use them in my research.

The download page has clear instructions about how to install the plugin.

PMR: This is typical of the many tools that the group has developed. I don’t use Zotero yet, but I am sure we shall integrate into our thesis authoring stuff.
So we hope to set up an ongoing collaboration. This shares the work, helps to avoid pitfalls, increases advocacy, visibility and adoption. The specific question of XML in blogs related to comments – the current WordPress seems to do fairly well on XML.
And maybe a return visit when they find the sun in QL.

Posted in Uncategorized | Leave a comment

Reporting conconsistent stereochemistry in CrystalEye.

There have been some general suggestions that the stereochemistry in CrystalEye structures may have errors. This is entirely possible as it’s quite difficult to check in unit tests without a large corpus of correctly annotated examples. We’d be grateful for individual examples where there seems to be errors.
At present I would like to limit this to the process of deriving atom-centered chirality (and creating a valid CML file). The relationship to SMILES and nomenclature should be decoupled at this stage. This has two steps

  • deriving the CML from the CIF including 3D coordinates and hence the <cml:atomParity> elements (these will normally be on Carbon).
  • deriving the wedge/hatch <cml:BondStereo> elements. Note that these are ALWAYS arranged to have the narrow end on the stereochemical atom regardless of any false persepective. Only the local coordination of the atom matters, not the overall 2D coordinates nor the stereochemistry of atoms bonded to the current chiral centre.

I’ll leave Nick to outline the actual code steps. JUMBO is used to derive the 3D Cartesian coordinates and CDK is used to derive the 2D coordinates. It is possible to carry out the assignment of wedge and hatch bonds completely in JUMBO – I’m not sure about CDK. This post is to show how the consistency of the results can be checked. There are a number of places where errors are possible, and here we shall OMIT:

  • analysis of the chirality of the overall structure derived from the anomalous scattering of the “heavy” atoms. This is measured by, e.g., the Flack parameter, (IUCr) CIF Definition data_refine_ls_abs_structure_Flack This parameter can vary between 0.0 (“correct”) and 1.0 (“wrong”) with 0.5 (“undeterminable”).
  • Indications of absolute stereochemistry from the name of the compound as reported by the authors.
  • Signs of torsion angles reported by authors in the CIF.
  • any diagrams in the fulltext

We believe that we transform the CIFs consistently so that all structures when viewed in Jmol should have “correct chirality”. We also believe that Jmol is consistent in what it displays. We assume therefore that the coordinates and their chirality are as the author intended. Note, however, that some structures might contain both left- and right-handed molecules. Apart from the racemic case which has necessarily 50% of each this is probably uncommon, but where it happens we muct be careful in the selection of the molecule we represent.
We then calculate the atomParitys but do not normally check them by hand routinely as a higher throughput of checking can be obtained from the 2D diagrams with wedge and hatch. This relies, of course, on the 2D/wedge code being correct and this is what we need to feel confident about. Here is how it can be done fairly quickly. We take examples from the latest RSS feed of CrystalEye.
Here’s an example:

Summary page for crystal structure from DataBlock ah9904 in CIF b715354fsup1 from article b715354f in issue 2008/4 of Royal Society of Chemistry, Organic and Biomolecular Chemistry.

(The overlapping groups at the SE have no chirality and represent a tri-isopropylsiloxymethyl group). There are three chiral centres and the code has added H atoms to all to make the chirality clear. Here is the corresponding Jmol display (part of the Benzoyloxy, NW and silyloxymethyl SE are clipped)

 

crystaleye.png

 

I hope you agree that the ligand atom positions round the three centres represent the same chirality as implied by the 2D picture. Remember that the convention is that the chiral atom is at the narrow end of the wedge/hatch.

 

If you have any cases where you believe that the 2D and 3D images do not correspond, please let us know – that’e the only point under consideration at this stage. It takes a few seconds to rotate the Jmol display till the molecule is roughly in the same orientation as the appropriate part(s) of the 2D diagram, so it’s a quick check.

 

[For interest only at this stage. The Silicon atom has enough scattering power that the absolute configuration can be determined with some certainty. The relevant Flack parameter is:

 

 

<scalar dictRef="iucr:_refine_ls_abs_structure_flack" dataType="xsd:double" errorValue="0.19">-0.17</scalar>
Read this as “the Flack parameter -0.17 is less than one estimated standard deviation (0.19) from 0.0. 0.0 indicates a strcuture which has been refined against the correct enantiomer”. So the crystallographi experiment asserts that it is highly likely that the chirality you see is the correct one.
The authors do not report any chemical name in the CIF (it’s possible that this can be found in the fulltext or SI) so it’s not possible to use that to confirm the stereochemistry is correct.]

Posted in Uncategorized | Leave a comment

Travels

Have landed in Melbourne for a 3-4 week tour of several eastern places.By chance attended a Fedora meeting at the Monash Conference suite at Andrew Treloar’s invitation. Overwhelmed by the feeling of welcome. FWIW I visit Peter Sefton next week where we hope to hack some ICE, then to Margaret Henty’s meeting on Open Access, then a day visiting UQ to look at ORE things (Jane Hunter and Kwok Cheung) – they are the first group to implement ORE. After that to Canberra to meet Alison Edwards – crystallography – , and also colleagues from geosciences. Then to Adelaide (Philip Lock, to explore new ideas in chemistry and informatics) and return to Monash to talk about repositories in various aspects.
This is a great place.
I have just been told that Australia is a “remarkably efficient filter to get rid of grumpy bastards”.
Also got my first view of the Melbourne Cricket Ground (MCG). A place of homage.
Maybe I won’t use my return ticket…

Posted in Uncategorized | 5 Comments

Open Access article in C&E news

Covergae of several aspects of Open Access and Open Data
http://pubs.acs.org/cen/science/86/8605sci1.html

Posted in Uncategorized | Leave a comment

Chemistry For Everyone – Nature Horizons

The review that Nature invited from me has appeared:
http://www.nature.com/nature/journal/v451/n7179/pdf/451648a.pdf
Only the first paragraph is toll-free, but the pre-review preprint has been saved in Nature Precedings.
My only reservation is that journal style required that I could not include names of Open chemists in the text and that the further reading is generic. So my appreciation of the Blue Obelisk and hinterland does not come across in the final article as much as in the preprint
P..

Posted in Uncategorized | 1 Comment

Open data and robots in Computer Weekly

Richard Poynder has done me the honour of an interview for Computer Weekly, the leading UK magazine:
http://www.computerweekly.com/Articles/2008/02/05/229273/peter-murray-rust-and-the-data-mining-robots.htm
It is very useful to have these opportunities to summarise state of play at regular intervals.

Posted in Uncategorized | Leave a comment

Early days of molecular modelling

Allen Richon is writing an article for Drug Discovery Today and a book on the early history of molecular modelling and I think this is really important to preserve. As I’m short of time Nico has helped by recording a podcast:
http://www.dspace.cam.ac.uk/handle/1810/195212
I expect there are some errors as it’s done off the cuff, so feel free to correct them.

Posted in Uncategorized | Leave a comment

XML is Ten!

I was honoured to be part of the development of XML (especially through the XML-DEV mailing list Ian  Jacobs of the W3C asked for digital memorabilia to commemorate that and also to save some oral history (we have lost enormous amounts of the early web as we didn’t have the means to save it and there was too much new and exciting to do).
So Nico Adams very kindly agreed to record my thoughts as an (audio) podcast. This took just under an hour (we rerecorded to get better quality). Here’s his announcement including the link to our DSpace ( in Cambridge DSpace is not only for fulltext papers).

[Nico]
I have submitted the audiofile containing the XML anniversary interview with Peter to our DSpace repository and you can download it from here:
http://www.dspace.cam.ac.uk/handle/1810/195211
It is in mp3, which every music player should be able to handle.

Many thanks Nico.
The recording assumes that the listener knows something about XML and I use terms like W3C without explanation. I have assumed it will be collected along with other artefacts so that there may be some communal explanations. The podcast is ca 50 minutes long and is made available under CC-BY (i.e. you can do whatever you like with it without my or Nico’s permission as long as we are acknowledged).
Coincidentally I got a request today from Alan Richon for information about the early days of molecular graphics (computational drug discovery) ca 1980. I might find time to record something brief about that and do something longer later.
Posts will be somewhat sporadic over the next few days.

Posted in Uncategorized | Leave a comment

Working with the NCI

I was intending to blog about our collaboration with Dan Zaharevitz and colleagues at the National Cancer Institute in the DTP (Developmental Therapeutics Program). Dan beat me to it: in a CMLBlog comment (February 4th, 2008 at 5:02 pm e) to CML – what and why. In the comment he explains why the NCI has chosen to work with us on CML.
Dan and I first made contact ca. 5+ years ago. I think he had noticed my posting or contributing to CDK (Chemistry Development Kit) and had asked about what CML could do.  We got into correspondence and as a result he supported Henry and me  in the development of JUMBO – probably JUMBO 4.6.
It is refreshing to work with the NCI. Their agenda is ultimately simple – methods of combatting cancer. And they are very clear that the way to do this is through Openness – Open Data, Open Source, Open Standards. So it is wonderful to have a sponsor who says “we will help you to develop this code” and you can make it Open – indeed this is  virtually a requirement.
NCI is well known for pioneering the release of their data in Open form. For many years the NCI database – with about 250,000 compounds and associated biological data – was the only data that could be used for free in chemistry. This database was the logical predecessor of Pubchem which now has over 18 million compounds. (An important difference is that the NCI database relates to physical samples while many entries in Pubchem do not).
Dan’s support has been invaluable. Firstly it’s supported us to do the work. Secondly it gives much moral support to continue. And third it has given us important feedback. Since CML has many uses (publishing, computation, crystallography) it ‘s very useful to have an organisation who wants to manage data. NCI is not only interested in chemical structure but also associated data, including analytical.
So it was great to sit in Dan’s splendid basement and review how he was using CML and how we jointly felt it might develop. CML details will follow on the CMLBlog.

Posted in chemistry, XML | Tagged , , , , | 2 Comments