Category Archives: blueobelisk

Open-Data-driven science and a brokering system for ONS

Cameron Neylon and Jean-Claude Bradley have blogged about a directory of Open Notebook Science (ONS) where projects including this approach can register.

21:19 14/10/2007, Cameron Neylon,
As has been flagged up by Jean-Claude Bradley there are a couple of places now where people can sign up to say that they have Open Notebook Science in their laboratory, practise Open Notebook Science,or even would like to find a place where they can keep an Open Notebook.  Jean-Claude has put a list on the Nodalpoint Wiki and I have set up a database at DabbleDB. Dabbledb is a rather cool web based database system that provides free access as long as you make the database contents freely available. Because the data is completely open I am not asking for people’s email addresses.

If you want to be included in the database you can put your details in on the form here. This will allow anyone to re-use the data (which you can find here) to generate lists on appropriate web-pages, or maps or any number of other nice re-uses of the data. If you are interested in the working of the database give me a yell and I can give you admin access.

PMR: As soon as we start to get the results of the NMR calculations on NMRShiftDB we’ll put them up, but I don’t want to register this before we have actually started (I have seen too many empty web pages in my career and I don’t want to leave them myself.) So we all have to be a little patient.

But then I thought that CrystalEye is an ideal resource for data-driven science. I’ve blogged about how crystal-data-driven research started in the mid-1970′s but there is a great opportunity to use crystalEye in new ways. Unlike the Cambridge Data Centre the data includes inorganic structures. The software is modern and extensible and it should be economic to develop many new applications.

CrystalEye is, of course, OpenData (we use the OKFN licence at present) and anyone can download it (we are still working out how to implement APP – Atom Publishing Protocol – to make this easy). But we’d also love to explore collaborative projects. We have all the data and software here so you don’t have to set it up. Crystallographic data makes good undergraduate, Master’s and PhD projects – Egon should know. So if you – or your collaborator/students/supervisor/whomever is interested in using this data perhaps we could explore this on the Wiki.

ODOSOS and an article on OA

Egon reminds us of the importance of the intensity of purpose that we need in the Blue Obelisk. (ODOSOS is our mantra: Open Data, Open Source, Open Standards). I won’t add very much new to that but I’ll also add and contrast OA.


I value ODOSOS very high: they are a key component of science, and scientific research, though not every scientist sees these importance yet. I strongly believe that scientific progress is held back because of scientific results not being open; it’s putting us back into the days of alchemy, where experiments were like black boxes and procedures kept secretly. It was not until the alchemists started to properly write down procedures that it, as a science, took off. Now, with chemoinformatics in mind, we have the opportunity to write down our procedures in high detail.I keep wondering what the state of drug research would be, if the previous generation of chemoinformaticians would have valued ODOSOS as much as I do. Now, with a close relative being diagnosed last week with a form of cancer with low five-year survival rates, I can not get more angry about those who want to make (unreasonable) money by selling scientific research. A 1M bonus is unreasonable. I can have 10 post-docs work on chemoinformatics research for the same period; I can have them work on drug design for various kinds of cancer.Therefore, I will continue to use every opportunity to convince people of ODOSOS, and will continue to develop new methods to improve accurate exchange of scientific data and experimental results. I will help people where I can to distribute open data, even if the whole project is not 100% ODOSOS. For example, the Chemistry Development Kit is open source itself (LGPL) which does allow embedding into proprietary software. This does not mean that I will contribute to the proprietary software, and actually am proud not having done so in the last 10 years.

I will continue to advice people how to make their work more ODOSOS, even if they cannot make the full transition. I will also continue to make sure that all my scientific results are ODOSOS, as there is no other kind of science. To set a good example, and, hopefully, to lead the way.

This is why I am a proud member of the Blue Obelisk.

PMR: I have had exactly these thoughts today and I’d like to ask for some literature help.

I have been invited to write an article on Open Data for a closed access journal, Serials Review – Elsevier which has a special issue every so often (ca. 4 years) on Open Access. I normally accept such invitations (assuming it’s on something I want to write on) and this one is important …

Serials Review (v.30, no.4, 2004) was a focus issue on Open Access. It remains one of the most heavily downloaded issues and articles even now. Open Access remains a “hot topic” and fundamental discussion in scholarly communication.

I’m not sure who has also accepted but the invitees are well known in the area.
I have taken my subject “Open Data in Science”. I intend to make exactly the case that Egon has made, that Closed anything usually disadvantages the human race.

In the Blue Obelisk we did not include Open Access, because it wasn’t – and isn’t – central to our activities. We are – I suspect – largely in favour but are forced to publish in Closed access journals because the the conservatism of chemists. We make our protests regularly and ritually – the technical editors know us well for the requests to mount stuff here, add addenda there, etc.

So I started through the disciplines – astronomy is open, chemistry is closed, biology is open. And I thought – if the bioscientists had been as selfish as the chemists we wouldn’t have genomes, we wouldn’t know how HIV works, we wouldn’t have the ribosome structure, we wouldn’t understand amyloid. Back in the mid 1990′s there was a movement to patent ESTs (bits of the the genome). I’d be grateful for chapter and verse but essentially Craig Venter wanted to patent these (I know patents are yet another concern) but in 1995 the pharma company Merck donated all its ESTs to the public good. This was typical of the concern of locking up IP.

I’m not sure when journals started to permit and then to require that authors publish their protein and nucleic sequences – I remember late 1980′s. But it’s now mandatory. Earlier the pioneers of bioinformatics , e.g.

*Needleman SB, Wunsch CD. (1970). A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48:443-453.

but also Bill Pearson (who’s here in Cambridge for a year and I met last week), Russell Doolittle, David Lipman, and Margaret Dayhoff (called the “founder of bioinformatics” by Lipman). They showed that the mechanical comparison of sequences was an incredibly powerful tool in understanding the function of proteins and genes, of modelling evolutionary processes, including viral mutations. This technique (and many variants) is at the heart of modern molecular bioscience.

NONE OF THIS WOULD HAVE BEEN POSSIBLE IF SCIENTISTS HAD NOT BEEN ABLE TO HAVE ACCESS TO THE WORK OF OTHERS, ROUTINELY AND WITHOUT EXPLICIT PERMISSION.

That is why I and Egon feel so angry when information is less than Open. Without Open information people die. So does our planet.
The technology is here. If we wished we could make every new piece of chemistry Open within a year. How much value would that be in finding new chemistry to use in the service of humanity.

[PS. I'd be grateful for any pointers as to how bioinformatics became free. Are there any lessons there for trying to change the chemists' mindset?]

Open NMR

As I have already blogged (WWMM calculation of spectra) we are hoping to provide Jean-Claude Bradley and others an Open service to calculate NMR spectra from structure. This  needs a lot of software components and a lot of glueware. With the release of FROG – not just Free, but Open yet another problem is solved, but we aren’t there quite yet.

The calculation of spectra from NMRShiftDB is automatic because, AND ONLY BECAUSE, Christoph and Stefan have used CMLSpect to represent the data. CMLSpect allows:

  • connection table
  • atom labels
  • 3D coordinates
  • spectra
  • spectral peaks
  • assignment of peaks to atoms

all these (except the raw spectra) are required for the calculation. Actually the connection table can be dispensed with if the hydrogen atoms are given explicitly – as they should ALWAYS be. (Implicit hydrogens have probably cost the human race thousands of wasted years through errors. There is now NO excuse for not including hydrogen atoms explicitly in files. Size of files? Rubbish. All the hydrogens in a year’s global chemistry are worth 1 day of astronomical simulation).

So with NMRShiftDB we have the simple process:

  • read NMRShiftDB file
  • add hydrogens with coordinates (JUMBO does this)
  • transform to Gaussian input (XSLT makes this automatic)
  • run job (Condor makes this automatic)
  • analyze results (i.e. compare calculated and observed – Nick Day’s software is making this automatic)

With the normal chemical environment this is messier

  • read mol file
  • submit to FROG to generate 3D coordinates. Hope it hasn’t changed the order of atoms
  • convert mol file to CML
  • read list of peaks in some legacy format (?Excel)
  • try to match peaks to atoms for assignment (probably have to rely on atom ordering)
  • create peakList in CMLSpect. How?
  • combine peakList with molecule in CML
  • transform to Gaussian input (as above) and then it’s plain sailing

The problems arise because:

  • hydrogens are a problem
  • mol files (and all other files than CML) do not have atom labels
  • there is no Open tool for assigning peaks to atoms
  • relying on atom ordering is a recipe for disaster and extremely difficult to debug

So what is clear is that we need a tool to couple JSpecView to a molecule in CML. The output, at least, has to be in CML because there is no other way of linking atoms to peaks.

This should be seen as one of the great (but achievable) challenges of the Blue Obelisk movement. When we get it, it will transform the way that graduate students record their peak assignment and publish their papers and THESES!

The chemical blogosphere cares

Wow! I posted a request yesterday (sic) for supporting material for our proposal to JISC for a person to support the blogosphere as a major resource for increasing the quality of published chemistry. I have had valuable contributions from 4 people already and now Egon has created a wonderful summary – just the right length. We’ll either include it as it stands of point to it from the proposal – depends on space.  [Recall that Egon's Chemical Blogspace blog aggregates the whole of chemical blogspace.]

17:01 01/10/2007, Egon Willighagen, chemical blogspace, pmrgrantproposal, chem-bla-ics
Peter is writing up a 1FTE grant proposal for someone to work on the question how automatic agents and, more interestingly, the blogosphere are changing, no improving, the dissemination of scientific literature. He wants our input. To make his work easy, I’ll tag this item pmrgrantproposal and would ask everyone to do the same (Peter unfortunately did not suggest a tag himself). Here are pointers to blog items I wrote, related to the four themes Peter identifies.

The blogosphere oversees all major Open discussion

The blogosphere cares about data

Important bad science cannot hide
I do not feel much like pointing to bad scientific articles, but want to point to the enormous amount of literature being discussed in Chemical blogspace: 60 active chemical blogs discussed just over 1300 peer-reviewed papers from 213 scientific journals in less than 10 months. The top 5 journals have 133, 78, 68, 57 and 48 papers discussed in 22, 24, 10, 11 and 18 different blogs respectively. (Peter, if you need more in depth statistics, just let me know…)

Two examples where I discuss not-bad-at-all scientific literature:

Open Notebook Science
I regularly blog about the chemoinformatics research I do in my blog. A few examples from the last half year:

Update: after comments I have removed one link, which I need to confirm first.

PMR: A few comments. Yes, I didn’t include a tag – but as I have said before the blogosphere rapidly converges. I sympathize with Egon that I don’t particularly like pointing to bad articles. However whent eh robots start refereeing journals – as they will in out project – they don’t have sentiments and if they find bad data they will expose it without a qualm. Of course we will have to check they “hardly ever” make mistakes (no one is perfect). And, of course, if you publish in Open Access journals there is no place to hide.

Eyeballs from the blogosphere

Fantastic! The blogosphere has already responded to our request for accounts of data quality enhancement.

  1. Egon Willighagen Says:
    October 1st, 2007 at 8:18 am ePeter, I’ve placed some pointer to past blog items from my blog that I feel relevant [1]. I’ve also tagged this overview with ‘pmrgrantproposal’ and requested others to do the same.

    1.http://chem-bla-ics.blogspot.com/2007/10/how-blogosphere-changes-publishing.html

  2. Cameron Neylon Says:
    October 1st, 2007 at 9:43 am ePeter, you don’t half set yourself steep targets with a 36 hour deadline starting on a Sunday morning! My posts on open notebook science are at http://blog.openwetware.org/scienceintheopen/category/open-notebook-science/

PMR: Many thanks. Please keep these coming, especially anything from the hexacyclinol stuff.
We have to ‘fess up. ‘I’ am actually a pan-dimensional hyperbeing like the mice in H2G2.
And “36-hours” is a meaningless spatio-temporal measure – we borrow from a virtual universe and then replace the “time”.

Open grant writing. Can the Chemical Blogosphere help with “Agents and Eyeballs”

In the current spirit of Openness I’m appealing to the chemical blogosphere for help. Jim Downing and I are writing a grant proposal for UK’s JISC : supporting education and research – which supports digital libraries, repositories, eScience/cyberinfrastructure, collaborative working, etc. The grant will directly support the activities of the blogosphere, for example by providing better reporting and review tools, hopefully with chemical enhancement.
The basic theme is that the Chemical Blogosphere is now a major force for enhancing data quality in chemical databases and publications, and we are asking for 1 person-year to help build a “Web 2.0″-based system to help support the current practice and ethos. The current working title is “Agents and Eyeballs”, reflecting that some of the work will be done by

  • machines, as in CrystalEye – WWMM which aggregates and checks crystal published structures on a daily basis.
  • humans as in the Hexacyclinol? Or Not? saga. Readers may remember that there was a report of the synthesis of a complicated molecule. This was heavily criticized in the blogosphere, and indeed the top 9 hits on google for “hexacyclinol” are all blogs – the formal, Closed, peer-reviewed paper comes tenth in interest.

Given enough eyeballs, all bugs are shallow” – Eric Raymond. In chemistry it is clear that the system of closed peer-review by 2-3 humans sometimes leads to poor data quality and poor science. We’ve found that in some chemistry journals almost every paper has an error – not always “serious”, but … So:
“Agents and eyeballs for better chemical peer-review”.

Not very catchy but we’ll think of something.
It’s unusual to make your grant proposal Open (and we are not actually putting the grant itself online, especially the financial details). But there are parts of the case that we would like the blogosphere to help with. If you have already written a blog on any of the aspects here, please give the link. You may even wish to write a post

  • showing that the blogosphere is organised and effectively oversees all major Open discussion in chemistry. I take Chemical blogspace as the best place for a non-chemist (as the reviewers will be) to start.
  • show that the Blogosphere cares about data. Here I would like to point to the Blue Obelisk and the way Chemspider has reacted positively to the concerns about data quality
  • show that important bad science cannot hide. I would very much like an overview of the hexacyclinol story – which is still happening – with some of the most useful historical links. Anything showing that the blogosphere was reported in the conventional chemical grey literature would be valuable.
  • Open Notebook Science.

We have three partners from the conventional publishing industry – I won’t name them – who have offered to help explore how the Agents and Eyeballs approach could help with their data peer review.

You might ask “why is PMR not doing this, but asking the blogosphere?” It’s precisely because I want to show how responsive and responsible the blogosphere is, when we ask questions like this.

There is considerable urgency. To include anything in the grant we’ll need it within 36 hours, although contributions after that will be seen by the reviewers. I suggest that you leave comments on this post, with pointers where necessary. Later I suspect we’ll wikify something, but it’s actually the difficulty of doing this properly and easily that is – in part – motivating the grant.

TIA

Volunteers: does the computer experience translate to chemistry?

One of the spinoffs of having been to scifoo is that I skim over 50+posts / day from the blogs that participants run. Some are multi-author blogs:  Here’s Andy Oram on Tim O’Reilly’s blog, talking about what makes volunteer documenters click. Read it all.

01:47 30/09/2007, Andy Oram, Planet SciFoo
By Andy Oram

[...]If value increasingly comes from communities of volunteers outside the compass of corporate management, isn’t it only right to shift resources to support these communities? I have to deal with that question in my own field of computer documentation, where the shift to community production is as happening as fast as it is anywhere. (I examine this trend in a series of articles about community documentation.) [PMR - listed below] But many industries could ask the same question I explore in this article: how can society shift its resources to support the important new source of value in communities?

Volunteerism needs support

The idea that volunteers play an important social role goes at least as far [...]

Volunteers who are paid, of course, are no longer volunteers. Companies have hit upon an enormous number of intermediate forms of reward by now: invitations to focus groups and conferences, honorable mentions, free products, etc. Still, serious problems in the concept of rewarding volunteers have been publicized:

  • Rewards create incentives to game the system, which would ultimately lead productive volunteers to abandon the system as unfair.
  • Even when rewards are fair, they “crowd out” the original incentives that led volunteers to serve in the first place.
  • It’s just plain impossible to determine how much each volunteer’s contribution is worth.

The final point just listed is the killer. The reasons for it are easy to state: the ultimate value created by any new idea may lie far out in the future, and the give-and-take discussion around information makes it hard to trace a valuable idea to an individual or small group. Let’s look at this problem more closely.

The value of information

[...]In computer documentation (as in journalism), certainly, it’s becoming harder and harder to add value to what the community contributes for free. So the challenge becomes how to improve the community’s offerings.

I find the key traits of value in documentation to be:

  • Availability–somebody has to write it in the first place. (Readers also need computers and Internet access in order to meet this goal.)
  • Findability–people need something better than current search techniques to find obscure documents, and particularly need help finding background when they read a document that assumes too much prior knowledge.
  • Quality–this covers such general and complex issues as accuracy, relevance, and readibility.

A particularly urgent aspect of quality is keeping a document up to date. Many a project has annoyed its users by starting out with reasonably good documentation and failing to keep it updated. Somehow, people who enjoyed writing something the first time lose interest in maintaining it. This is just as true for comments in source code and commercial books. (Many of my authors have built their reputations and businesses on books they’ve written, and despite good intentions have been unable to find time to update the books.) I myself have lived out the feeling of writing new documentation for a free software project and then lacking the motivation to go back to it.

Thus, companies and user consortia who want to direct resources toward making software more usable can consider:

  • Offering incentives that make the best people contribute, while trying to avoid invoking the crowding-out phenomenon.
  • Providing paths through documentation, so readers can find what they need in their particular state of knowledge. This task is an ongoing research project for any particular body of documentation.
  • Ensuring continuity, by tracking the need to update documents and finding people to do so.
  • Training contributors to do a better job and make the most of their efforts.

The last of the tasks interests me in particular, because it provides scope for offering my skills as an editor and O’Reilly’s as a publisher. But we need some compensation for it.

I feel funny, of course, offering our services as editors or other quality providers when the original authors might not be paid. But if you accept that it’s harder to recruit people for supporting roles than for leading roles, payment is justified.

To conclude, I think volunteers can be supported without being paid directly. If they know their work will be improved to be more useful and will have lasting value, they’ll have more incentive to contribute.

[...]

PMR: and the details:

… writings by Andy Oram about web pages, forums, and other media used by users of technology to educate each other. Articles include (in reverse chronological order):


Andy Oram
Editor, O’Reilly Media
Home page

PMR: This is very relevant to recent development in the Blue Obelisk, where a volunteer community has become the keeper of the SMILES de facto standard. We should read Andy’s thoughts carefully.

The equations are similar but not isomorphic. Why do people work with the BO? Here are some ideas:

  • A sense of community. This is a major reward for many people, being able to keep in touch and knowing that you are on the right track (or more importantly, on the wrong one). And the price of membership, though not explicitly stated in the gift economy, is to contribute and to uphold the ideals of the work.
  • A fuzzy mixture of morals, ethics and politics. It is the “right thing to do”. If that drives some people, great. On the reverse I have been attacked several times for being immoral in promoting various aspects of Open Chemistry – it destroys the jobs of honest hard-working developers. [No, it creates jobs for those people who wish to translate to C21].
  • Personal “academic” karma. This is a major motivation. As the BO succeeds those people who have been associated with it will be asked to write articles for value publications, to cooperate on the next phase of funded Web 2.0 grants, etc. For aspiring scientists to work together.
  • Personal financial reward. This is a powerful and valid motivation. There is lots of potential – I wouldn’t have a job today if I hadn’t contributed to the development of XML. When we look for people to join us, the blogosphere is an obvious recruiting ground.  And as the balance shifts from closed to open there will need to be ways of monetizing Openness. The chemical information market is worth at least low billions of USD – it’s still going to be there in 10 years’ time. But many of the conventional players will be gone and new ones will have taken their place.
  • Fun. Yes, fun. We like writing algorithms. If you are a Sudoku addict you’d enjoy writing a chemical substructure search. We like drawing molecules. Many artists – like Jane Richardson – have joined the community of molecular graphics. We like building second life. We like writing blogs.
  • Changing the world. Everyone contributing to the BO is changing the world… It may not be apparent, but it’s real.

and as Alma Swan, quoting Gandhi, (blogged by Barbara Kirsop) reminded us:

‘first they ignore you, then they laugh at you, then they fight you, then you win’.

The BO has not won yet. It’s somewhere between ignore and laugh, and for the next little while we’d love some documentation volunteers!

The Obelisk SMILES

We are delighted that Craig James has suggested making the molecular format SMILES an Open activity. Egin Willighagen writes:

08:03 28/09/2007, Egon Willighagen,
Craig James wants to make SMILES an open standard, and this has been received with much enthusiasm. SMILES (Simplified molecular input line entry specification) is a de facto standard in chemoinformatics, but the specification is not overly clear, which Craig wants to address. The draft is CC-licensed and will be discussed on the new Blue Obelisk blueobelisk-smiles mailing list.Illustrative is my confusion about the sp2 hybridized atoms, which use lower case element symbols in SMILES. Very often this is seen as indicating aromaticity. I have written up the arguments supporting both views in the CDK wiki. I held the position that lower case elements indicated sp2 hybridization, and the CDK SMILES parser was converted accordingly some years ago. A recent discussion, however, stirred up the discussion once more (which led to the aforementioned wiki page).You can imagine my excitement when I looked up the meaning in the new draft. It states: The formal meaning of a lowercase “aromatic” element in a SMILES string is that the atom is in the sp2 electronic state. When generating a normalized SMILES, all sp2 atoms are written using a lowercase first character of the atomic symbol. When parsing a SMILES, a parser must note the sp2 designation of each atom on input, then when the parsing is complete, the SMILES software must verify that electrons can be assigned without violating the valence rules, consistent with the sp2 markings, the specified or implied hydrogens, external bonds, and charges on the atoms..

PMR: This is excellent. The problem with specifications is that it is VERY difficult to describe them so that independent groups can interpret them consistently. I spent some years helping with the XML effort and apparently simple ideas could cause huge debates. (e.g. namespaces…) It’s well known that some constructs in computer languages, such as

int i = 6;

int j = i++ * i++;

i = i++;
cause enormous confusion. What are the results? (Try to work it out, then try it out and then find the “right answer” (your compiler may surprise you) [*].

Back to chemistry. Almost all formats have been proprietary. That means that there is unlikely to be much useful interactive public help from the originators, and the only check is likely to be a binary executable. When I joined the pharma industry and started trying to get some standards, one software company threatened to sue anyone who published their molecular file formats. It’s slightly better now, but IMO the responsibility for the current appalling situation lies with the pharma industry which has had no effective interest is standardising anything and is now paying the price. (It can only survive by using information, and until it makes this standard and largely free it won’t).
That’s a major reason for developing CML (Chemical Markup Language). CML is open, and uses open standards (XML). It’s much larger than SMILES, and there are places where it is defined less well than we would like, but at least it’s open and that can happen.

SMILES is very widely used. Creating an open standard will take more effort than might appear. The “aromatic” or “lower case” concept is extremely difficult to define. I don’t understand the definition:
The formal meaning of a lowercase “aromatic” element in a SMILES string is that the atom is in the sp2 electronic state.

I don’t believe that SMILES has anything to do with electronic states and I think it should simply be a means for counting atoms, formal bonds and electrons. Is there a difference between Cn(C)C and CN(C)C ? The first represents a planar transition state of trimethylamine, the second a pyramidal ground state.

But the positive point is that I have the chance to make this view and other the chance to support it, modify it or challenge it. Just like Wikipedia, the Blue Obelisk uses the court of public opinion. And we have the exciting position that a “Web 2.0″ community is now about to lead the chemoinformatics world.

Maybe the pharma industry will take us seriously. And, wonder of wonders, might actually come into the open, say so, and offer some support.

[*] actually both are undefined and may give different answers

Diazonamide : The Blue CrystalEye Greasemonkey lends a hand

There is some doubt about what the structure of diazonamide A is. Because there is no absolute way of assigning names to structures. We only agree what aspirin is because everyone has been assigning the same structure to it for 100+ years. Many people are careless with names and even more are careless with structure diagrams. Indeed there seems to be a minor industry in drawing some structures wrongly. A year of two back when Nick Day was pioneering the use of InChI he used “staurosporine” as an example. He found lots of structure diagrams and I think there were 19 (sic) different diagrams. Some were frankly “wrong”. Others missed out the stereochemistry, others had other problems. And some of these were from suppliers sites (i.e. “labels on bottles”).

So how can we be sure? It needs an authority – but which one? Staurosporine is a (potential?) drug, so… WHO drugs? British National Formulary? US National Pharmacopeia? Chemical Abstracts? Beilstein? All of these are pay-to-view. So I cannot look them up (remember I am at home, simulating an interested person, such as a patient). Ah! Pubchem… with 16 entries, and several variations of stereochemistry. Wikipedia has a nice picture … but this about diazonamide…

On TotSynth’s post there’s a link to the latest paper (DOI: 10.1021/ja0744448). And following this I find:

diazonamide3.png

PMR: The Blue Eye is NOT part of the abstract – it shows that the Blue Obelisk Greasemonkey has found a crystalEye entry which looks like this:

diazonamide4.png

and here you can see the actual stereochemistry of the diazonamide nucleus (it’s not exactly the title compound) so there is virtually no doubt. The diagram on the right is calculated from the 3D coordinates and the layout is through CDK – note the stereo wedges and hatches.

So now I know what some of the stereo is. And because PNAS have made the text Open I can read how it relates to TS’s structure. The CDK may not be 100% beautiful, but it should be true (Cue some reader finding it’s wrong and a bug in JUMBO, but that’s what Open science is about). And you can always pay Chemical Abstracts 6.20 USD to check whether you have got it “right”.

So install the Blue Obelisk Grease Monkey (blog post) in your Firefox browser and Open your Eyes to a whole new world of truth and beauty.

CrystalEye GreaseMonkey

Nick Day has just released a Greasemonkey script which provides a full crystallographic overlay for existing journals. It’s worth trying as it’s visually exciting as well as very useful. This post tells you what it does, how it works, and why all publishers will actually benefit from making their crystallographic data Open.
The CrystalEye GreaseMonkey (Javascript) needs to be installed (from http://userscripts.org/scripts/show/11439) inside your Firefox browser. (I don’t believe this is a risk, but make your own decision). It is then activated whenever a new page is loaded from certain sites (e.g. pubs.acs.org/* for any ACS journals). You scan switch in on of off from the box and also decide which sites you wish to visit.

crystaleye0.PNG

When it finds a DOI in the page (usually from a TOC) it asks the CrystalEye site whether this DOI is listed as one containing one or more crystal structures. (CrystalEye contains over 100,000 crystal structures, most from the last 5 years but some, via the Crystallographic Open Database, going back several decades). CrystalEye returns the addresses of those structures corresponding to the given DOI. The Greasemonkey then adds the CrystalEye logo (I have removed the publisher’s graphic because of copyright).
crystaleye1.PNG
The blue eye (because this is a BlueObelisk-eyed monkey) indicates crystals and in this case there are three [1][2][3]. Clicking on the first immediately loads the Jmol applet and the metadata:

crystaleye2.PNG

The links are direct to the publisher’s site and if you have a licence (or if the article is Open Access) you’ll be able to read the fulltext. The material here is all automatically derived from the data (no images or text have been taken). You can even see what we calculate the chemical structure to be:

crystaleye3.PNG

Again all this is automatic. (Credits to Jmol and – right- CDK structure diagram generator).

So here we have something very close to an overlay journal. No textual commentary, but we are working on that.

So thank you to Dave Martinsen of ACS for reviewing Greasemonkey. And we hope that it increases the clicks on your full-text – people will see the crystal structure and be so excited they will wish to read the full article.
It also works for RSC, IUCr and others like American Mineralogist. But not for Wiley, Springer and Elsevier. Not because we have anything against them, but because they don’t make their structures available. CrystalEye cannot find them, so it can’t point to them. And so, publishers, you are losing out to those publishers who DO expose their crystallography. And perhaps CrystalEye will persuade authors to publish their structures where they can be most seen.