Browsing habits for chemists


Totally Synthetic posted his blog stats today (Blogging… ) so I thought I’d post the ones that Jim Downing has set up for us. Apart from anything else I like the graphics.

Days of week
Pages: 779.50Hits: 1365Bandwidth: 25.38 MB Pages: 4094Hits: 7443Bandwidth: 161.17 MB Pages: 3930Hits: 7989.50Bandwidth: 149.56 MB Pages: 4543Hits: 10763Bandwidth: 129.98 MB Pages: 2473.50Hits: 5738.50Bandwidth: 92.76 MB Pages: 1217Hits: 1807Bandwidth: 26.39 MB Pages: 1555.50Hits: 2244.50Bandwidth: 37.60 MB
Mon Tue Wed Thu Fri Sat Sun
Day Pages Hits Bandwidth
Mon 779.50 1365 25.38 MB
Tue 4094 7443 161.17 MB
Wed 3930 7989.50 149.56 MB
Thu 4543 10763 129.98 MB
Fri 2473.50 5738.50 92.76 MB

I don’t get this. We do work on Mondays in the UK. It can’t be timezones – e.g. Monday is not really Tuesday. The interesting comparisons are for the next two:

Operating Systems (Top 10) – Full list/VersionsUnknown
Operating Systems Hits Percent
Windows 34047 50.6 %
Unknown 15733 23.3 %
Linux 10519 15.6 %
Macintosh 6905 10.2 %
Sun Solaris 27 0 %
FreeBSD 23 0 %
HP Unix 2 0 %
GNU 2 0 %

Browsers (Top 10) – Full list/VersionsUnknown
Browsers Grabber Hits Percent
Firefox No 24387 36.2 %
MS Internet Explorer No 18612 27.6 %
Unknown ? 13311 19.7 %
Safari No 5226 7.7 %
Mozilla No 3650 5.4 %
Netscape No 823 1.2 %
Opera No 613 0.9 %
NetNewsWire No 322 0.4 %
Konqueror No 126 0.1 %
TelePort Pro Yes 63 0 %
  Others   125 0.1 %


Here Totally Synthetic has
tsstats.PNG
Now it is dangerous to compare stats collected on two different systems, but it seems like that (a) there are a greater proportion of chemists reading TotSynth and (b) chemists are more likely to use Windows and MSIE.
If so, that is one of the things we have to take into account when suggesting approaches to software development.

Posted in chemistry | 2 Comments

Mystery Molecule and Jack Dunitz on Fluorine

Jack Dunitz (one of the greatest chemical crystallographers) visited our lab today. I had told people beforehand that I would ask him what the mystery molecule was and prophesied that he would get it immediately. He did.
This gives me a chance to record the enormous personal debt I owe to Jack with whom I spent a year in Zurich. He is deeply loved by the many people who have passed through his lab. He now works almost exclusively with theoretical tools rather than equipment and today told us about Fluorine – or more precisely organofluorine compounds. Substituting hydrogen by fluorine in hydrocarbons (aliphatic or aromatic) makes almost no difference to physical properties (except density), but despite their similar properties the fluorocarbons and hydrocarbons don’t mix. In fact perfluoro butane in butane has one of the highest activity coefficients (10). But even the molecular volume is almost unaltered. In some directions the fluorine is actually smaller than the hydrogen.
So there are still many simple observations in chemistry that we don’t understand. With Gautam Desiraju one of Jack’s most engaging was that over the whole of reported chemical space there are more compounds with an even number of carbon atoms than odd (Nature closed access reference) – see report in New Scientist where he is quoted as:
“It’s much more intriguing if you don’t offer an explanation,”
I’ll leave you with the puzzle he greeted me with when I started in Zurich. “If a golf ball is hit with an infinitely massive golf club moving with velocity V, what will be the velocity of the ball after the collision”. The answer is simple and logical.

Posted in general, programming for scientists | 13 Comments

Blogging Blogging…

TotallySynthetic has just posted Blogging…  where he exults over the growing chemical blogosphere. I share this enthusiasm. He also implies the reinforcement effect – if you have N blogs that are linked there are N*N links and this makes the blogosphere more valuable than all the single blogs. For example his readership is likely to be predominately practising synthetic chemists – this blog has (inter alia) a readership interested in scholarly communication. So they get to see his blog and vice versa.
I have linked to his post in this post so his blog should reference this post. That means that his natural readership should link directly to this post. (This is – I think – called a pingback). If I get it wrong I’ll repost.
What this means is that we are creating a new chemical information tool. It is very powerful in that we can do whatever we want (within the laws of libel, etc.).  For example I have posted a simple bibliometric study on whether the compounds in his blog are interesting. The synthetic community is starting to respond. I hope some useful new views will come out of it.

Posted in "virtual communities", chemistry | 3 Comments

Is Natural Product Synthesis Interesting?

This is a slightly provocative post – I would welcome comments.
When the Nobel prizes were announced this year there was consternation in some of the chemical blogosphere that the chemistry prize had gone to a biologist (see links in my earlier post). But others argued that chemists had lost the plot and that biology is now where the action is. It was important not just to do research in one’s own community but to appeal to a wider audience.
A lot of chemistry (reflected by the chemical blogosphere) is concerned with making compounds. A subset of this is the synthesis of natural products, compounds produced by organisms (fungi, bacteria, plants, marine invertebrates, etc.). These compounds often have biological activities (e.g. to kill other other organisms) and are frequently useful in medicine. Typical examples are taxol (anticancer, from the Pacific yew tree) , ciclosporin (immunosuppressant from a fungus) and vancomycin (antibacterial from bacteria). They are in widespread clinical use and this is reflected by a very large literature. I have used Pubmed to get a base metric – I simply enter the name and see how many references there are and the earliest date:

  • taxol (1971) 12505 references
  • vancomycin (1955) 12225 references
  • ciclosporin (1980) 20505 references

(The numbers are approximate as names can vary – we also find “cyclosporin” – but Pubmed with its MeSH terminology is very good about synonyms).
These compounds are difficult and expensive to extract from natural sources so there is clear value in trying to make them more cheaply by synthetic chemistry. Moreover it may be possible to synthesize similar compounds which are better or much easier to make or both. So not surprisingly the pharmaceutical industry (in which I have worked) and academia are interested in synthesizing complex molecules. And this is reflected, for example, in TotallySynthetic‘s excellent blog where he reviews the syntheses of natural products.
But are the targets chosen because they are useful, or because they are difficult? And do the results of such synthesis find their way into the clinic or use as biological tools? To be fair, it’s important to remember that it takes a long time to develop drugs, and also that chemical synthesis might be a prerequisite step before any other science can be done. Bearing that in mind, here are the Pubchem metrics for the compounds in the last 2 months of Totally Synthetic’s blog (back to 2006-09-01). The table shows:

  • the name used in the blog. (Sometimes there are additional letters which I have removed to increase the hits). Bear in mind that some compounds might have synonyms, but Pubmed is pretty good about this.
  • The date of the first abstract in Pubmed. This is often the first report of the isolation of the compound and its possible biological activity.
  • The number of papers in chemistry journals. These are mostly reporting the synthesis, but some might discuss the elucidation of structure. The latest chemistry paper as abstracted by Totally Synthetic is sometimes not yet included in Pubmed, but in general all the major chemistry journals are reviewed.
  • The number of papers in “non-chemistry” journals. This is subjective but these are the ones where biosynthesis, biological activity, medicinal properties, etc. are likely to be reported.
name earliest data chemistry pubs non-chemistry pubs
Acutiphycin 1999 1 0
Latrunculin 1983 ? 757
Monocerin 1970 2 0
Apoptolidinone 2001 4 0
Strychnofoline 2002 2 0
Manassantin 1987 0 14
Mitorubrin 1965 3 3
RK-397 1993 3 2
Clavilactone 2000 1 1
Chartelline 2005 3 0
Antheliolide 2005 1 0
Sylvaticin 1990 2 4
Elatenyne 2006 1 0
Platensimycin 2005 2 5
guanacastepene (sic) 2000 22 1
Bengazole 1993 4 3
Batzelladine 1999 19 3
Marinomycin 2006 0 1
Hexacyclinol 2002 3 2
Himgaline 2006 1 0
Polygalolide 2003 0 1
Leucascandrolide 2000 15 0
       
       

Out of 22 compounds a few have a significant biological literature. Many, even when reported some time back have none. So it is difficult to argue that they are synthesised because they are interesting.
There are other reasons than interesting biology for doing this type of synthetic chemistry. It can lead to new synthetic methodology which could be of use to the chemical industry in general. And it may be good training for the chemistry-specific part of a PhD. So I welcome comments – they will all be posted.

Posted in chemistry | 8 Comments

Could an Open chemistry journal fly?

This post addresses the Closed world of chemistry publishing and offers some not very optimistic comments. I subscribe to the CHMINF-L list which serves the chemical information community. Much of the traffic is about specific (usually commercial) chemical information services or where to find esoteric pieces of information. It reflects the innate conservatism of chemistry. I and a few others have from time to time raised the question of how do we take chemistry into the current century, but generally fail to get much of a response. There is, for example, little belief in the value of Wikipedia, etc. or how to develop virtual chemical communities.
Occasionally, however, some of the membership raise the question of Open Access (and implicitly why chemistry is effectively the least enlightened major scientific discipline by having no major Open Access journals). Michael Engel is one of the few list members who tries to question the way things are currently done and writes (Open Access and costs):

I still do not understand the 3000 USD per paper. How much unnecessary overhead costs are in this figure ?
I wonder how much it would cost to have a Chemistry journal at

virtually no cost to the author and the reader.
Couldn’t it start simple ?
– no printed issues
– no advanced layout; just html/xml and export scripts (pdf)
– open refereeing (registered users only, eventually following the
example of web.de which connects the username to a postal address by
sending the activation code by postal mail, mailing costs could be
sponsored by advertising)
– a lot of sponsors and advertisement – as in printed journals
– allowing mirroring
– allowing mining
Possible problems:
– referees don’t want to comment without the shield of anonymity
– vandals and people trying to change papers
Scaling possible ?
100 papers/year -> 1’000 -> 10’000 -> 100’000
What would be necessary for the reader to make a comfortable reading possible ?
– Good and large index for scrolling and email alerts (automatic
indexing is necessary)
– Online reading of index, abstract and papers in a Google Reader style.
I would really appreciate if some of the CHMINF reader could add some comments/links/hints etc.
Wiki will follow.
I’ll try to give an accurate (and rather depressing) answer here. First a graph which will be explained below…
rapodaca.png
On simple answer is that it has been tried.
  • Steve Bachrach ran the Internet Journal of Chemistry for a period. It was at (http://www.ijc.com/) but the server no longer runs. I was on the board and also authored and refereed papers. Refereeing could be done in very short time and the paper appeared within days. It wasn’t Open (in today’s terms it is Toll Access – no author charges, but subscription-based). The scholarly record is, I think, still extant but may be in danger of disappearing with negotiations with the owner (not Steve)
  • Jean-Claude Bradley publishes his science directly onto the Internet (UsefulChemistry molecules) without peer-review but Openly visible This is effectively a zero-cost model (marginal academic costs – e.g. a university of personal server and effectively part of the actual practice of science).
  • The chemistry blogosphere now contains high quality reviews such as TotallySynthetic which reviews peer-reviewed articles in closed access chemistry publications with the subject of natural product synthesis. I believe that this is effectively peer-reviewed by the community.
  • Wikipedia has a large and growing amount of factual chemistry material which will (I believe and hope) challenge the current overpriced and out-of-date methods of secondary publication in chemistry. IMO Wikipedia is also effectively peer-reviewed by the community.
So the problem is not technical but social. Or, put another way, chemists. The graph above was published in Rich Apodaca’s blog as Name That Graph. It represents the number of articles published in the Beilstein Journal of Organic Chemistry (“an Open Access, peer-reviewed online journal that will encompass all aspects of organic chemistry.”). The Beilstein journal is free to authors and free to readers. So what could be more attractive? As you can see it’s a year-and-a-bit old. All new journals take time to develop (I’m told that other Biomed Central journals took 5 years). So we can’t judge it yet.
But the answer is simple. Chemistry exemplifies the artificial citation economy which is destructive of innovation and amplifies statis. Effectively chemists (like me) are judged on their formal publication record in journals with high impact factors. Maybe this is the best we can do, but it means that anything that isn’t a formal publication in an established leading journal is very difficult to justify. It doesn’t get promotion, it doesn’t get funding, it doesn’t get the institution credit. And the process is increasingly mechanised. I have heard chemists who say that in the US (and probably elsewhere) promotion is determined solely by the number of publications (or possibly citations) in the Journal Of The American Chemical Society. And how are these citations measured? We leave it to a commercial company (such as Thomson ISI) to give us metrics about academic value. In other words we have no metric or worth of our own – we rely on a process which is driven by how much money an independent company can make out of it. There is no societal control over this.
So we are locked in a dystopia. Small changes are detrimental to any individual or organisation who tries to change it. Publish in an unusual way and you will suffer. Yes, there is a visible better future but there is no way to get there by our own will. (That doesn’t stop me trying, but I’m regarded as some sort of maverick, I suspect).
However it must and will change. And I’d point to the following:
  • The Wellcome trust and other funders
  • The changing face of the information world, such as Flickr, Facebook, etc.
  • The increasing economic unsustainability of conventional publishing.
So it will change. I think it will be dramatic. But I don’t think we can say how.
And I’ll be interested to see if Michael Engel gets any positive encouragement from the CHMINF-L…
Posted in chemistry, open issues | 2 Comments

Mystery Molecule: have I given away too much?

In many detective novels the murderer makes a fatal mistake. Sometimes they don’t realise this, sometimes they try to cover their tracks. On the Internet you can’t cover your tracks –

“The Moving Finger writes: and, having writ,
Moves on: nor all thy Piety nor Wit
Shall lure it back to cancel half a Line,
Nor all thy Tears wash out a Word of it.”

So there is an inadvertent half a line in the history (actually less) that I cannot wash out and is a glaring pointer to the solution. I’ve only just spotted it.
However I promised that if people wrote comments I might publish more clues. So here is the latest comment:

  1. Propter Doc Says:
    November 10th, 2006 at 5:12 pm eWell I spent a good bit of last night trying to figure things out, but never got anywhere close enough to post a comment. I would say that there was very little logic in my approach which ranged from guessing ‘orange stuff we’d encountered as undergrads’ and looking up wikipedia, to searching web of science for your previous papers to see if there were any clues there.
  2. I replied
    November 11th, 2006 at 12:34 am e(1)Thanks very much for posting. Although everything I reported in the original post was correct, the molecule is a bit of a chameleon and dark red is not its commonest colour which is yellow. I’ve probably acted like too many murderers, priding themselves on their undetectability, while gradually giving away more information. There is now enough information on these pages to solve it, given all that has been written and to know that the result is almost certainly correct

So, to confirm this statement here is a picture of a crystal of the mystery molecule (not real size). I am not breaking copyright by posting this (though I can’t say why as that would give more clues). It’s much more beautiful than this suggests.
mystery.PNG
I can’t think of any more clues at present. But if I get more mail I might – or I might not.

Posted in chemistry | 1 Comment

Negotiating Open Access – a mutual success

I was recently invited to write a review for a closed access journal run by a commercial publisher. The subject of the article [in confidence] was dear to my heart and the journal is largely aimed at a community of readership which does not normally read my material and can, I hope, benefit from what I write. But I have a natural reluctance to publish in Closed Access journals. (Yes I do so in chemistry, but that is because they are all Closed. And yes, before you ask, we (me and Henry Rzepa) do argue the toss with the publishers. Frequently). So what to do? As the publisher’s invitation was labelled confidential I have removed all identifying names and truncated some chunks.

  • Accede to the publisher’s requirements. Take the money and run. It’s not the end of the world.
  • Refuse to publish and hack some useful code instead.
  • Accede to the publisher’s rules, but then put my copy in my Institutional Repository.
  • Accede to the publisher’s rules, but then put the final publisher’s copy in my Institutional Repository. (c.f. the Subversive Proposal of Stevan Harnad). This breaks the formal rules of copyright, even though many people ignore this and some positively honour the breach as an act of civil disobedience.
  • Negotiate with the publisher to see if they would accede to an Open publication.

Although several people urge me to break copyright I feel that as an advocate for formal change (in Open Data) I have to obey the formal rules. I want publishers to listen to arguments for Open Data (and many do) and they will be less receptive (I think) if I am not seen as an irresponsible radical. (There is, of course, a large need for irresponsible radicals). So I chose the last option:

Thanks you for your invitation. As you know I am a proponent of Open Access, and would be interested in this invitation if you would be prepared to offer the article as Open Access (author(s) retain copyright)…

I cannot reproduce the confidential correspondence in detail but we have come to a mutually acceptable solution. I shall thank the publishers in the finished article. Perhaps this is a useful model for other authors.

Posted in open issues | 7 Comments

Org Prep Daily – Help! An Open Opportunity for Chemistry

Milkshake runs a superb blog (Org Prep Daily) which does exactly what it says on the tin. Here is an example of a post (6-amino-4-chloro-2-methylpyrimidine). Take a look – you don’t have to be a chemist to get the idea and the quality. He recently posted:

I will run out of procedures to post in about a week or two. Since the chemistry cannot keep up with posting one decent procedure a day, I will have to revert into a more convenient one-post-per-month frequency. (I think I will also rename Org Prep Daily as The Reactionary Organiker and will proceed to expound my entire worpetermr’s blog › Edit — WordPressld-wiew here.) Only you can prevent it – by submitting a good reproducible scalable procedure to appear here. My Scripps address is tomasv{}edu.

It’s obvious we need to step in and help. So how? There are several aspects:

  • a community needs to self-assemble. Posts like this will, I hope, help raise awareness. This could be Internet-mediated diffusion, but I would also suggest it’s a wonderful opportunity for undergraduate projects. These can be of very high quality – I know Henry Rzepa does this (not yet public) and Mark Winter has a splendid site Molbase) which is curated and has an effective mechanism for peer-review
  • there needs to be an agreed social procedure. This is a very common process in Open Source where several models have been developed. A common one of Benevolent Dictator For Life (BDFL) which works on a mixture of meritocracy and trust. That’s the current implied models for blogs – the blogmeister is effectively the BDFL. In the Blue Obelisk we have a more federated approach – there are many projects, each with a guru or BDFL, but we share all our experiences. There’s no single solution. There are elaborate social pressures on conformity, forking, etc.
  • The blog needs to have a clear purpose. At one level this is very clear – the daily posting of organic syntheses. But I’m not quite sure what the selection policy is – are these just molecules that catch Milkshake’s fancy? or that he has just happened to make – this would be similar to Useful Chem where the blog is the primary scientific record. Or are these particularly valuable synthetic intermediates? If this is clearer I can make more suggestions.
  • The technology needs to be agreed. IMO this is increasingly suitable for a Wiki rather than a blog. Several of us are looking into how we create semantic tools for Wikis (blogs are somewhat more difficult) where we can author chemistry directly into the Wiki.

Those of you who are not chemists will know the great value of compendia of reactions. At present these are normally sold (at considerable expense) in paper-based compilations. These are difficult to search and are completely uninfluenced by the Internet revolution. We now have the opportunity for social computing to collect the primary reaction data from the literature in free and universal form.
Most of the problems are solved.

  • Much of the chemistry is published as Supplementary Information (i.e. not part of the sacred fulltext). It’s therefore free of copyright. (Not all publishers take this view so now, dear readers, do you see why we need to ask for Open Data.)
  • Although the data is normally in PDF Hamburger form, rather than HTML Cow, it’s fairly easy to transform it into text, use OSCAR to create HTML and thence automatically to XML. (I have found that PDFBox does a pretty good job – of course it can’t do the chemical structures which has been mashed into soggy gif horrors – not even hamburgers.)
  • The recipe can be created for humans, but it’s also straightforward to create CMLReact – the version of CML that supports reactions. Org Prep daily would be an ideal substrate for this. Most of the software is in place – it needs some simple glueware. The great benefit of this is that molecules can be searched, reactions can be balanced, and the reaction-specific data (e.g. temperatures, etc.) can be specifically searched.
  • Increasingly Wikipedia will become the primary reference for chemists – it will be trivial (even automatic) to link OPD to Wikipedia and Pubchem. We’d also see OPD syntheses being added to WP either as entries or links.
  • The technology exists within the Blue Obelisk community – it needs some volunteers.

This is a wonderful opportunity for you to change the world of chemical information. I hope very much you support Milkshake.

Posted in "virtual communities", chemistry, open issues | Leave a comment

Mystery Molecule: the trail is getting colder

It’s three days since the mystery molecule was announced and although I’ve had local interest in Cambridge there’s only been one external post. You must know that in police work it is critical to make rapid progress in the first 48 hours. After that the trail gets colder. So if you had fed me with comments, there might have been some more clues.
I have confirmed that Peter’s approach works. It actually mirrors much police work – careful systematic collection of large amounts of small pieces of data. If this were a matter of national interest or there were a finiancial reward it would have been solved by now.
Remember that your posts can be anonymous. And not to give the mystery away.
But perhaps it’s simply that it’s not worth blogging chemistry – I have an interesting Open Access item that I must post later today and will certainly get some feedback.
P.

Posted in chemistry | 2 Comments

Mystery Molecule: more suspects

There has certainly been some speculation in our real-life community about the molecule! I am assuming that there are some other chemists out there who are speculating. Please give us some feedback – if you do maybe there will be some more clues – or some more bodies…
Two public feedbacks from my colleagues:
Peter Corbett blogs

Jim Downing is stuck on a mystery molecule, and wonders whether the answer can be found via search. Well, I doubt that conventional chemical informatics search will do him. Google Suggest, on the other hand, may be his friend. It seems that Peter Murray-Rust is being coy about a certain critical piece of information, something that’s normally unimportant that may be considered to be a dead giveaway. Well, it’s a giveaway because of linguistics. You see, there’s a couple of collocations that describe the molecule in question, and if you can guess one half of the collocation – or systematically search through the available values (there’s a finite set, and not too large) using publicly available tools trained on the entire web as a corpus – then the other half will leap right out at you.
I think there’s a broader point to make here, but that will have to wait for another time. There’s an idea I’d like to talk to real people about before I blog it.

Remember that Peter knows what the answer is. Although I am holding stuff back (what detective writer doesn’t?) I am not sure what particular bit he’s referring to. (I admit only to the space group and the cell dimensions – that is all I knew in 1970 when I first encountered the mystery molecule). But Peter is right that “the answer is out there” – Judith MR disovered a splendid site today which gives the complete story.
Jim Downing blogs:

I’m whiling away a few minutes trying to work out Peter’s mystery molecule. IANAChemist, and although I’ve been working with them for 10 months now, I fear that very little has rubbed off. However, Peter reckons that you can’t work it out chemically, so I figure that not being a chemist shouldn’t be too much of a handicap. Like betting on the novices at Newmarket.
I leapt off to Pubchem, and used the molecular weight limits. Restricted to 240-260 (the syntax is 240:260[mw]). Nearly 14,000 compounds. Hmmm. Narrowing it down, 245-255 gives under 7000. Still needle in a haystack.
Need to assess what we know, and make some inferences. Hopefully the lazy web will mean some kindly chemist can fix my broken logic.
OK, so what do we know? It seems to be fairly commonly observed, so I’m guessing its structure was discovered a while ago. Well, you can’t limit in PubChem on a range of CIDs, but you can limit to NCI as the source of the data. This didn’t reduce the hit count.
We also know that Peter was trying to do co-ordination chemistry at the time. I didn’t know what that co-ordination chemistry is (I told you, I’m really not a chemist), but wikipedia soon sorted me out. So there are probably either metal ions or organic ligands. My A level chemistry and the variety of colours of the crystals suggest that the metal ions are in the molecule. So I added a limit requiring a heavy atom into pubchem (I know this doesn’t necessarily mean metal ions). Only 14 Hits, whoppee! Probably the wrong 14, but hey.
So my best guess is molecular iodine, which Peter assures me is wrong.
Where is my logic broken? Is it possible to find this molecule by search?

This is a brilliant approach and worthy of the Web. Unfortunately not all detective mysteries are solved by logic. So, for example, most chemists will know immediately that they have molecular iodine (it’s dark metallic grey and purple in solution). So I suspect that Jim cannot know that he has got the right answer, whereas anyone who has come across this phenomenon will. Whether a chemist can “work it out” or search for it from the clues given I don’t know. Peter thinks yes.

Posted in chemistry | Leave a comment