Digital Curation 2006 in Glasgow

I am going to the

2nd International Digital Curation Conference
Digital Data Curation in Practice
21-22 November 2006
Hilton Glasgow Hotel, Glasgow

which

will address different aspects of the curation lifecycle including managing repositories, educating data scientists and understanding the role of policy and strategy.

There is already an active forum/blog (2nd International Digital Curation Conference Blog) run by Chris Rusbridge (which actually alludes to this blog) . Chris ahas set out many issues and started to get some replies.
I’m on a panel – 3 panellists, 45 minutes.

Session 5: Panel Session – Open Science including Legal & Science Issues
Chair: Chris Rusbridge
Panel:
Andrés Guadamuz González – E-Commerce lecturer at the University of Edinburgh, and co-director of the AHRC Research Centre for Studies in Intellectual Property and Technology Law
Shuichi Iwata – Professor of data science and environmental engineering at the University of Tokyo, and CODATA president
Peter Murray-Rust – Department of Chemistry, Churchill College, University of Cambridge

I always think panels should give the delegates as much time as possible to bring up issues and offer their ideas – not just listen to semi-presentations from the panelists. So here are my issues for the panel sesssion:

  • How do we change the culture of scientists to recognise the value of preservation, at the time of doing the work?
  • How do we move away from hamburger PDF to semantic documents based on XML/RDF?
  • Can we wrest “ownership” of scientific work from the publishers and move it to the community, using a Science Commons license or similar
  • Can we create a self-sustaining business model
  • What is the best way to make this happen? institutions (e.g. requiring reposition of digital theses), funders (e.g. requiring reposition of data), publishers (e.g. allowing Science Commons licenses – this is NOT the same as Open Access) and/or moving to a better business model or process

In some disciplines this has happened, or is starting to happen.
I will, of course, pick up more ideas on Tuesday. But that is roughly what I want to offer the audience.
P.

Posted in data, open issues | 1 Comment

Community peer-review? In chemistry???

Why do scientists publish in scientific journals? Are they still necessary? This has been debated intensely in recent years, but the chemical blogosphere gives a recent twist to the subject. Even if you aren’t a chemist you should be able to pick up the intensity and value.
The following seem seem to be the main arguments for having journals (I have added my prejudices). Please add your comments if you disagree. I’ll come to peer-review at the end.

  • A formal record of the scientific process. Well, in part… But with the hamburger PDF culture much of the process is destroyed. Can the web do better? Look at Jean-Claude’s Useful Chemistry– for example a complete timeline.
  • Establishment of priority. Yes, but… we could just as easily put the material in institutional repositories and I and colleagues have done for 250, 000 molecular structures. Our formal paper on this, which came out only about a year afterwards is not even Open Access so there is little point in referring to it here.
  • Preservation of the scientific record. Probably true for most journals. But we have probably lost the Internet Journal of Chemistry (on which I was a board member) as it was closed access. Moreover the more we tend to “database of articles” rather than “journal” the less happy some of us feel.
  • Widespread dissemination. Do people read more papers because they are in journals? Maybe
  • A collection of articles with common or contrasting themes. This is still true in some places. But I suspect it’s decreasing with the mechanisation of science. How many physical chemists will now read a neighbouring article on synthetic organic chemistry? So why co-publish them?

So we now come to peer-review. There seem to be the following aspects:

  • To detect (and possibly correct) errors in the manuscript.
  • To decide whether the work is fit for some purpose. This could be relevant for clinical trials, etc.
  • To give the work some element of esteem. This is normally a very crude measure – the paper is either accepted or rejected by the journal. (In some cases there is a “highlighted” paper – e.g. with a cover picture, etc.

Note that the peer-review system has nothing formally to do with the standing of the journal as in the Thomson-ISI “impact factor” (created and managed by a commercial organization). Nor with the citation count from the same source. All that peer-review can do is to aggregate certain articles in a given place and offer them to this “service”. If we believed in bibliometrics (which I don’t) then aggregation into artificial “journals” contributes little if anything.
Now the chemical blogosphere. I have commented on this earlier, but it is mainly created by graduate students and postdocs. Jean Claude, Christoph Steinbeck and myself are among the exceptions. So form your own opinions as to whether it is capable of any sort of useful peer-review.
Yes? Pasteur became a professor at 26; Kekule proposed tetravalent carbon at 28; Arrhenius proposed the electrolytic theory at 25… So I at least take the blogosphere seriously.
Let’s recall the conventional process of publication in synthetic organic chemistry. It’s worth remembering that this is a relatively self-contained field and (as I have commented earlier) concentrates on making molecules that are difficult rather than necessarily of wide general interest. (I will comment on this at a later date). The judgements that are made by the journal/publisher/reviewers are:

  • have the authors made what they set out to make? and is their account believable?
  • is the work novel?
  • is the synthesis (not the final molecule) sufficiently interesting that it is worthy of publication?
    Remember that tenure depends on whether the paper gets published in a journal of sufficiently high rating. The publisher makes additional stipulations (yes, the publisher – remember that they are running the process, not the scientific community).
    • no part of the work can be reported before publication. This means that all Jean-Claude’s work is automatically barred from formal publication irrespective of its worth. I’m not clear what the reason for this is, but we have certainly been made aware of it ourselves by one publisher. It’s probably because they worry that they will sell fewer journals or no-one will read the papers.
    • the manuscript is reviewed by (e.g.) three anonymous reviewers whose comments are passed to the authors who have the right to reply. This may iterate. The comments are normally not made public
    • The editor (or editorial office) of the journal decides whether the manuscript is accepted. Authors may or may not try to argue with this.

    The reviewers work hard, and for no financial reward. They do their best – it’s time-consuming (could take a day or more to review some papers). But they don’t always get it right…
    TotallySynthetic in his blog has regular comments on synthesis papers (ca. 1 per day). There are between 10-20 useful comments on each from the blogosphere. Every month there is a vote on the best synthesis. So unlike the citation index, there is a considered judgement of worth by votes from the blogosphere. In terms of worth which would you choose? Blogosphere or Thomson-ISI?
    Can the blogosphere spot things the reviewers missed? Here’s a recent TS post and blogosphere discussion of Stereochemical Reassignment of Mehta and Kundu’s Spiculoic Acid A Analogue:

    Balwin, Kirkham and Lee. Org. Lett., 2006, ASAP. DOI: 10.1021/ol062361a.
    Mehta and Kundu. Org. Lett., 2005, 7, 5569 – 5572. DOI: 10.1021/ol0521329.
    Now this is a little scary. Both Sir Jack Baldwin’s group at Oxford and Mehta’s group at the Indian Institute of Science were working on the total synthesis of Spiculoic Acid A. Kundu published the Org. Lett. referenced above, last year, showing their progress towards the target, but it’s taken until now for someone to spot the problems with the paper. Baldwin’s group read the paper thoroughly as part of their publication preparation, and found quite a few problems.
    Amazingly, Kundu assigned the stereochemistry quite incorrectly, originating from a Sharpless epoxidation where they predicted the wrong outcome… this in turn led to the incorrect configuration at many of the stereocenters. To compound this, stereocenters invert in configuration during the synthetic scheme for no apparent reason, and the nOe-NMR data looks distinctly suspect.
    Come on ACS, lets referee papers properly! (PMR Note: Org Lett is an ACS publication)

    Now let’s look at some of the comments on TS blog (I have excised the names for brevity)

    The piss-poor refereeing has nothing to do with the ACS – they can only go so far. If group leaders and established, tenured faculty can’t be arsed to check stuff properly then things like this will happen.
    What I find hardest to understand is how no-one noticed the benzyloxy stereocenter magically inverting (although it probably has something to do with the appalling quality of the transition state diagram)
    […]
    How can you miss things as obvious as the outcome of a Sharpless epoxydation? How can you erroneously change the configuration of a benzyloxy center from one reaction to the next one?
    Don’t get me wrong. Errare humanum est. But before publishing data, you should make sure that no gross mistakes remain. Other than that, it’s the referees’ fault.
    Another question is, was this work thoroughly refereed at all? I mean, there is a 2-month time-span between reception and publication (from September to November!), so it should be enough time for correct, careful refereeing.
    Last time I sent something to Org. Lett., one of the referees sent a three-page list (!!) with corrections and changes he thought should be made to the paper and supporting info. I suppose you don’t assign “picky” referees like that to every paper (specially for “renowned, established group leaders”), but I’m pretty sure that at least, that guy would have spotted all this mess way before it got to the point it did…
    […]
    Part of the problem with manuscript refereeing is the turn around time – with OL you get a week. If it is a long synthesis, then it requires a significant amount of time to go through all of the data carefully. Some reviewers do not take the time to do this – if the paper looks ok, and the chemistry appears to meet the standard (whatever that may be), then it gets recommended for acceptance. I have reviewed several manuscripts from well-known practioners, and they do not get a free-pass from me!
    […]
    The paper by Mehta had pointers that a curious and careful referee could have investigated.

    http://sanjayat.wordpress.com/2006/11/01/self-correcting-science/
    Unfortunate that the new structure could be obtained using the information given in the paper itself, with no additional experiments being conducted.

    […]

    TotallyMedicinal. I wouldn’t say it has ‘nothing to do with the ACS refereeing’. The mistakes in that are so basic that and as they were missed it makes me wonder what the referees actually do with communications of total synthesis? I am no referee but I would imagine it is common sense to run through each individual step regarding their integrity etc?

    […]Speaking as someone who has refereed papers for Org. Lett. and other ACS journals, I’m not sure how much to blame the refs here. Of course, once you know there’s an error, it’s difficult to assess whether you, as a referee would have spotted it. I’d like to think that I’d have noted the mysterious inversion of stereochemistry in the Diels-Alder reaction, but I’m pretty certain that I wouldn’t have noticed the wrong enantioselectivity in the Sharpless… it’s just one of those things where you assume the authors knew what they were doing.
    As for re-interpreting someone’s NOESY spectrum – well perhaps I’d have looked at it if I’d noticed the anomaly later on, but almost certainly not otherwise.
    One of the problems for referees is that it takes a really long time to do it thoroughly… and for what reward? As one journal editor put it to me once – “It’s like peeing yourself in a pair of dark trousers – it gives you a warm feeling, but no-one else knows about it.”
    […]
    This issue also raises the question about how referees can actually do a good job. The problem is of course that there is an exponentially rising amount of information out there waiting to be refereed, and most of it is uninteresting and ordinary. Plus, the referees have their own stuff to do. Is it really possible to have foolproof refereeing, even of simple things? There’s got to be some things that referees have to take on faith, even though ideally they should not do it.
    […]
    Mike’s raised a very important point. Thorough refereeing is really an ungrateful job. It will take a lot of time and sometimes you’ll just spend 3 hours carefully checking a paper, Supporting Info and all NMR spectra included, to find out that there’s nothing wrong with it, other than a few mispelled words and a couple of problems in English.
    2006 seems to be a fertile year in this kind of mistakes and fait-divers! We would be discussing the whole issue of refereeing all over again… any plausible alternatives?
    […]
    Possible alternative: the model proposed by the Public Library of Science (http://www.plosone.org/) sounds exciting, coupling good old peer review system with open-access community-based reviewing/commenting/rating.

    Curly Arrows runs another blog:

    Now when I read this paper it actually came across as a nice piece of synthetic work. Unusually, these guys blatantly admit that their synthetic strategy towards the natural product failed. So what they do instead is provide a proof of concept by synthesising an analogue of the natural product using some Diels-Alder chemistry. Okay so that’s all fine and dandy. At this point I’d like to say that I am very glad that I didn’t referee this paper because things are about to get very hairy. Moving swiftly on to 2006 where Baldwin and co-workers publish the total synthesis of the enantiomer of Spiculonic Acid A in Chemical Communications
    Biomimetic synthesis of marine sponge metabolite spiculoic acid A and establishment of the absolute configuration of the natural product
    James E. D. Kirkham, Victor Lee and Jack E. Baldwin, Chem. Commun., 2006, p. 2863

    A very nice piece of synthetic work and a well written paper too. These dudes at Oxford really know what they are doing. So at this point I guess that Baldwin and his mates realised that there were some discrepancies between their data and those of Mehta and Kundu. Hence, they decided to sit down and dissect Mehta and Kundu’s paper to figure out what was going on. The result of this little exercise was published in Organic Letters recently:

    Stereochemical Reassignment of Mehta and Kundu’s Spiculoic Acid A Analogue
    Kirkham J. E. D., Lee, V. and Baldwin, J. E., Org. Lett., 2006, ASAP Article
    DOI:

    Now this paper is really worth a read. We are talking major bitch slapping here. To me the most unbelievable mistake is the incorrect stereochemical assignment of an epoxide obtained by a Sharpless asymmetric epoxidation.
    It appears that these Indian dudes haven’t been able to use the mnemonic model published by Sharpless to predict the stereochemical outcome. This is what Mehta and Kundu write in their paper regarding the epoxidation:

    Sharpless epoxidation of allylic alcohol 19 in the presence of the D-tartaric acid diethyl ester was stereoselective (9:1) and afforded the epoxide 20 in a predictable manner with ample precedence.
    And that’s only the beginning. Their NOE interpretations are all over the place and it seems that they can’t decide on the final stereochemistry of their Spiculonic Acid A when you compare the structures in the supplementary material with those given in the paper. Here’s another brilliant quote from Mehta and Kundu’s paper regarding their NOE interpretations (notice the language. One of these guys must have spent some time in the US and bought himself a dictionary):The stereostructure of 9 was delineated on the basis of incisive analyses of its spectral characteristics, particularly the COSY and nOe data.
    Anyway, hats off to Baldwin and co-workers for spotting all the mistakes and submitting the paper and to Organic Letters for accepting it. It is quite remarkable to think that these guys from Oxford have managed to publish in Organic Letters without conducting a single experiment. I highly recommend reading these three papers in chronological order.
    So – in my estimation – this is effective peer review. The community knows its stuff, read the literature. What would we lose if experiments were made public as Jean-Claude does? And why shouldn’t we use the community vote on the value of syntheses? But I suppose I am too romantic
    :
Posted in "virtual communities", chemistry, open issues | 7 Comments

Premature Optimization

For many years I have believed (and still believe) in the following (quoted in Wikipedia) :

Tony Hoare first said, and Donald Knuth famously repeated, “Premature optimization is the root of all evil.” It is important to have sound algorithms and a working prototype first.

I’ve been finishing a molecular builder in JUMBO and it’s far more important to make sure that the right atoms are joined, rather than it joins the wrong atoms at lightning speed. So it’s nearly right and I applied it to larger systems.
It suddenly started to go very slow. Now there are lots of ways that programs can go solwly. In many cases the elapsed time is dependent on the size of the problem. Double the size, double the time.
But this had what is called “Quadratic Behaviour”. Double the size, quadruple the size. Joining 10 fragments was a few seconds, 100 fragments took 2-3 minutes. And in our apps we need to join this number.
So why was this? A typical example is adding unique items to a list. For small lists you can just run through the list and check for duplicates. But if you have 10 elements in the list it takes 10 operations. If you have 100, then 100 operations.
Isn’t this just linear? No :-(. Because to add 100 items you have to search the list 100 times and as it gets bigger it gets slower. The actual expression is something like n*(n-1) but that’s effectively quadratic.
Well I knew where the problem was so I didn’t need profilers or anything like that. I just went to the code and put in a few timings to check. Funny! didn’t seem to be there. Put in a few more timing statements. Still didn’t seem to find the problem. 3 hours later the code is covered with timing statements and I’ve found it:

CMLAtomSet atomSet = new CMLAtomSet(molecule);

What’s happening. All this is doing is iterating though the atoms in the molecule and adding them to a set. Sets in Java are optimised to avoid quadratic behaviour. Should be very fast.
The problem is that the atoms are being added one-by-one. After each one we update the XOM (which acts as the central storage for the problem). It’s safe. There is no way the XOM can get out of sync with the data. But each read and write from the XOM (which contained lists of strings) was expensive. Doing it 100 times was hitting the machine.
So I made this a lazy process. The XOM is not updated for each atom, only at the end. It now takes 1 sec for 60 fragments, not a minute. The trade-off is that you have to be careful to update at the end – so there is a function updateContent(). You MUST call it. I’ve now hidden all this from the developers so that the code above is both fast and safe (I hope – that’s where the unit tests are absolutely essential).
Some of you will have tutted that I should have run the profiler immediately. They’re right. I should have a profiler in my Eclipse. I’m off to get one. Any suggestions?
But that could take 15 minutes to install.
And the fix was only a 5 minute fix…

Posted in programming for scientists | 1 Comment

Help! Where's the old Tenderbutton

I have enthused about Dylan’s chemistry blog, Tenderbutton. Unfortunately for us, he closed down about a month ago. But I thought I would always be able to read the archives – they are wonderful record of chemistry as it happens – not the sterilized material in ritual publications. But now this grey literature has been closed – it is only available with a password and no indication of how to get one.
This is a great pity. One of the great regrets of my life is that we have lost much of the spontaneity of creation and discussion in the early days of the web. Here it looks like we could be losing a seminal part of the chemical blogosphere.
I’m guessing that the closure is to protect spammers, not because the “authorities” have removed it from view. But either way it will lead rapidly to decay. (We’ve essentially lost a whole journal – the Internet Journal of Chemistry – because of difficulties archiving.)
Why am I worried about a blog? It’s only a blog, not a proper publication, isn’t it? It’s not properly peer-reviewed. It’s only a graduate student writing it. It didn’t have official authorisation. Most of the readers were also graduate  students.
WRONG! It’s actually the start of new approaches in scientific communication. To me it is at least as valuable as the formal publications of synthesized organic chemistry that no-one other than organic chemists is interested in.
I was going to pursue this thesis but I needed the old TB material to refer to. In my opinion it should be deposited in the Stanford Instititutional Repository. It’s a work of scholarship and of great value to scholars.
(These opinions don’t necessarily transfer to every blog, but they do for most of the current chemical blogs).

Posted in chemistry, open issues | 7 Comments

The War on Error

There’s been a lot of  excitement over Pete Lacey’s The S stands for Simple. This Socratic dialogue, which I blogged yesterday has shown the futility of the overengineered madness from the W3C committees. There are other similar postings, summarised in Bill de hÓra‘s blog:

The War On Error

Last March: REST wins, noone goes home.Well, it looks like we’re done. Which is worse, that everyone gets it now and we’ll have REST startups in Q207, or that it took half a decade?
It’s tempting be scathing. But nevermind, The Grid’s next.

So in the our Centre we shall be going 100% for RESTful chemistry – it’s just a pity we have wasted so much time.  I am interesed to see that the Grid is next! Certainly my own approach is that where we can use HTTP rather than Globus we should – at least in the initial stages. That’s not to say it isn’t without its uses – just that we haven’t needed it yet.

Posted in programming for scientists, XML | Leave a comment

SOAP of the evening, beau…tiful SOAP

There are simple ways to do things on the web, and there are Simple ways. Jonathan Robie, a long-time member of XML-DEV found this…

From: Jonathan Robie
To: “xml- >> ‘XML Developers List'”
Subject: [xml-dev] The S stands for Simple
I thought this was pretty funny:

Read it, even if you aren’t a techie. Yes, it’s hilarious, but it’s also tragic and true. What’s it about?
The Web needs standardized ways of doing things. Remember the browser wars when Netscape and Microsoft fought each other to a standstill? “Best viewed in XYZ version 1.2.3”. With the advent of XML a whole new technology was generated. Rightly or wrongly XML has been adopted as the lingua franca of middleware and the infrastructure of the Web. When it’s used well, it’s great. I’ve had the great privilege to be part of that community and been involved in some of the most exciting virtual communities of all time. First there was XML, led by Jon Bosak, supported by Tim Bray, James Clark, Norbert Mikola – I’ve documented it on on XML-DEV. Then XSLT – a good design (if rather purist in its insistence on being a declarative language and devoid of useful procedural functions – e.g. no storage!). And then SAX – to me the epitome of the collaborative Web community. After my nagging on XML-DEV, David Megginson took control and directed, orchestrated the contributions from ca 100 list members. The full story is told here. David finishes:

Thank you to the following people for contributing to the initial discussion about the design of SAX. The current proposal contains many of their ideas and suggestions, though it does not match exactly what any one person wanted.

SAX was not design-by-committee – it was design-by-BDFL-led-meritocracy. It took a month. One month. It’s on every modern computer on the planet. It’s simple, it works and it does exactly what it says on the tin.
And then there are the horrors… Overdesigned, incomprehensible, unusable, committee-driven. And in my nightmare dreams I assume that they are deliberately designed badly so that IT companies can sell training and installation into customers. At one stage I thought it was simply that I was an inferior being and could not aspire to the rigours of software engineering. I assumed that the complexity and impenetrability were necessary for robustness. That if I didn’t use them my code would fail in horrible places beyond my control.
One of the worst horrors is the W3C DOM. I have wasted a year of my life on this disgrace. An object-based protocol which deliberately forbids subclassing???! Where the return value of a missing attribute is undefined. Where there are multiple ways of doing most things. I wrestled, and I thought it was me. I wrote a complete wrapper for this monster to try to get Xerces to work. (I will expound my extraterrestrial theory of Xerces later…). And then – about 2 years ago I discovered XOM. A DOM I could understand and use. A simple DOM. The only problem was it isn’t a standard. And I’d like to use standards for CML. But it’s author, Elliotte Rusty Harold, wrote so eloquently about why the W3C DOM is so awful that I realised we were right and the W3C was wrong. As simple as that.
So many of us are waking up (or fully awake) to the fact that many, if not most, fomral specs are over engineered. The world is splitting into two camps, those that favour fully engineered specs and those that like lightweight – even ultra-lightweight approaches. Here’s a table of heavy and light (my assignment).

heavy light, RESTful
W3C DOM XOM
Xerces AElfred
XML Schema Schematron
SOAP REST
WSDL REST
XML RDF RDF N3
Semantic Web semantic web microformats

Again we wasted an elapsed year with WDSL, we’re taking it all down and redoing it in a RESTful manner.
So, like Humpty, I’ve had enough – ‘I meant by “impenetrability” that we’ve had enough of that subject, and it would be just as well if you’d mention what you mean to do next, as I suppose you don’t mean to stop here all the rest of your life.’

Posted in Uncategorized | 3 Comments

Information Barter

A comment to my post Negotiating Open Access – a mutual success resonates with me:

# Rupak Chakravarty Says:
November 17th, 2006 at 6:03 am e
I believe, If we really want to knowledge driven open society, where business models are less effective in governing the process of knowledge creation and sharing, we will have to follow the “Barter System”.
As in Barter system, in which goods or services are exchanged for other goods and/or services; no money is involved in the transaction, the knowledge society should be based on “bartering knowledge”.
I’m a great listener/ reader of Sir(s). Peter Suber and Steven Harnad and look forward to receive the pieces of “digital intellect” posted by them on the web.
Thanks and regards.
Long Live THE Open Access.

Thanks Rupak,
I have been toying with this idea for some time – if we run a useful service we don’t want to charge for access – all the hassle of licenses, micropayments, anticommons, etc. We might use a donation system (of money) and maybe we shall. But I have been attracted by payment in kind…
Shortly we plan to release a major knowledge base (details later, but it involves molecular structure, crystallography, etc.) which is all Open (CC) and we hope people will use. However we need more content – especially that locked up in the dungeons of dusty departments – which is effectively Open but not disseminated. Scientists will have this content anyway (if they can still find it!) and all they will need to do is donate their already Open (but not disseminated) work – our service will act as an aggregator and disseminator.
In one sense this is common in some communities – you cannot publish genes, strucures, crystallography without donating your data to the community. But it is a controlled process (through a mixtures of learned socs and publishers) – we are proposing here a “gift economy”
I will return to this theme later but would welcome instances where this already works.

Posted in "virtual communities", open issues | Leave a comment

(Chemical) Images in blogs

I am following up a post where I suggested we could provide a service for drawing molecules in blogs. One problem is how to incorporate them into the post.
(I’m still working on this post, so don’t believe it all)
When I create images in my posts I have to:

  • create the image somehow (draw it, cut and paste, etc.)
  • save it on my filesystem
  • “Upload” it to WordPress using a rather clunky uploader.

So when I painted TotallySynthetic’s web stats from his blog I simply cut and pasted them into a pixel editor, trimmed them, saved to disk, uploaded, etc. Can this be made easier.
One possibility is that I can link tp other images. So here is one of Peter Corbett’s latest posts. The first image is referenced by:

<img xsrc="http://www.pick.ucam.org/~ptc24/webn.gif"/>

If I want to link to this I can paste this URL into my post which gives you this:

This is a HOTLINK. It is easy for me, but there are problems. Every time I load the image (by opening the browser) it accesses Peter’s blog. If I get 1000 human hits a day, Peter will get 1000 hits per day (assuming the images are downloaded). Also if Peter’s server is offline, you won’t see the image. But it’s simple.
However I can copy this image into WordPress. It’s no more difficult (and no easier) than uploading an image from disk. You simply load the URL into the file browser text field and it will copy the image into WordPress’s local image store. Now it looks like this:
webn1.gif
If we create semantically useful titles for the images it should be possible to do some fun things.
There are some issues here. As Peter pointer out, Creative Commons gives rights over the content but not necessarily right to link to the server on which they are mounted. You can easily see denial of service problems here. I’d welcome any ideas of whether this is going in a useful direction.
P.

Posted in chemistry, general, programming for scientists | Leave a comment

God's golf club

The golf club problem was a throwaway – I didn’t expect it to have a long duration. I can’t resist the following plea and will give a formal answer (there is a much simpler way of doing it)

  1. Russ Says:
    November 15th, 2006 at 12:02 am eI was hoping someone would pipe up with the answer, because now it’s kind of bugging me. It would seem the standard high school physics doesn’t apply to infinitely massive golf clubs wielded by superbeings. Or if it does, the golf ball flies off at infinite speed and/or blows up in a blinding flash of energy (amounting to it’s mass times the speed of light squared).Put a nerd out of his misery here – what’s the answer?

Assume the club has mass M and velcity V and the ball is mass m and velocity v. Momentum must be conserved and, for an elastic collision, so must energy.
Initially:
Momentum: MV
Energy 0.5 M V*V
Afterwards:
Momentum MV’ + mv
Energy 0.5 M*V’*V’ + 0.5 m * v * v
We have the equations:
MV = MV’ + mv
M*V*V = M*V’*V’ + m*v*v
We can reduce the variables by using the ratio of masses (A=M/m) :
A(V – V’) = v
A*(V*V – V’*V’) = v*v
Two equations, 3 unknowns – you will come out with a ratio of velocities. Bit messy. Then let A tend to infinity. You should get a simple answer. Then see if you can find a simpler approach – look at it with a fresh viewpoint –
P.

Posted in puzzles | 10 Comments

Semantic Chemistry in Wikis and Blogs – a proposal

Joerg Wegner posts (Blogging chemistry means not blogging minable data) :

As posted by Peter more and more chemists are blogging. And I would appreciate if those blogs would contain more chemical minable information. I think especially Rich and Egon have given some nice examples on their blogs. And beside of those blogs Wiki’s can also use chemical information (and not images!).
Anyway, those blogs are just great:

And here are some ideas what is missing on those blogs:

  • tags based on reaction names (with cross-links), reagents, products, …
  • chemical data in a native and downloadable/minable form, e.g. CML or InChI. Preferably all those data entries should have unique identifiers and tagged reaction centers! So, I prefer here rather CML than InChI.
  • more literature references using DOI or PMID

I obviously and absolutely agree. The question is what to do about it. I have been struggling with simple code on this blog (I have now cracked it) but even loading images isn’t absolutely trivial (you have to upload from a directory and then upload int the blog, whereas we would really like to cut and paste.
The commercial chemical tools are not much help. Apart from costing money they are designed to be integrated into Word documents using (I think) ActiveX.  Moreover they aren’t designed to be semantic. In the Blue Obelisk we are developing tools which are XML-CML aware and which – ultimately – will be the simple solution to this. The main challenges are:

  • discover, develop and enhance semantic wikis and blogs. Please post anything you know that actually works.
  • find simple (almost automatic) ways of embedding InChIs and CML invisibly in text. I think this would be relatively simple if we can agree on a method.
  • enhancing the Blue Obelisk tools (especially CDK, JOELib and JChempaint) to provide simple chemical services for Wiki/blogs

I came up with a novel way of doing this – what do you think? Since everything on blogs is public, we could use a communal authoring service. Let’s say we have a server – and this is something I’ll put to my colleagues – that provides a graphical authoring interface for semantic chemistry. (This would include reactions as well as compounds). You would create the molecules you want (and we’ve got some simple ideas for accelrated graphics) on the server. It would create all the InChI and CML transparently. It would give you two URLs:

  • An image of the chemical object (like the GIF we have at present). You could either link to this or cut and paste it or both.
  • A link to semantic chemistry stored on the server. Since all our work is public and I think most of us use Creative Commons there is no loss of IPR – quite the reverse. Clicking on the link could bring up Jmol, JChempaint or whatever, without any need to add client-side functionality

Note that if the server is unavailable you would still have the local PNG in the normal way. There are many advantages:

  • there would be a communal repository for any molecule. This means that simply by linking to (say) ciclosporin it would give you a template (or many templates) which the community had already drawn.
  • The molecule could be linked to other sources such as Wikipedia, Pubchem, etc. Conversely they could link to this resource. We build a communal knowledge base.
  • The server can provide services (e.g. logP from CDK or JOELib).
  • The server would have search facilities (CDK, JOELib, OpenBabel)

and we can all think of many more.
If this excites you – it may need altering – let us know.

Posted in "virtual communities", chemistry | 5 Comments