Thoughts on the chemical blogosphere

I’ve a few minutes to kill before the shuttle to Scifoo…
I believe the chemical blogosphere is among the leaders in domain-specific blogging and I’ll be bouncing this idea off the SciFooCampers (where the blogosphere is seen by many as a central part of the new scientific commincation process). I’ve been blogging for somewhat under a year, with a hiatus of about 3 months when the muse of code got the better of the muse of blogs (do these entities have names?). And so I have followed – in a rather haphazard way – some of the chemical blogs. There’s a complete site devoted to them, Chemical blogspace, run by the tireless Egon Willighagen. Even if you aren’t a chemist, Please have a look – I can’t cut and paste much of it as it’s interactive or autogenerated. But some snippets:

Chemical blogspace collects data from tens of scientific chemistry blogs and then does useful and interesting things with it.

(PMR: there’s certainly more than 100 chemical blogs now)

With Chemical blogspace, you can:

PMR: and goes on to provide:

Latest Molecules

ondansetronhydrocodonehydromorphonemidazolamfentanylsevofluranedichloromethanedrugs

PMR: this list is autogenerated and these are REAL molecules (not just pictures) as they link to resources with full connection tables, InChIs and RDF. Then:

PMR: Not necessarily the “best”, just those whose muses continue to drive them to produce. The site contains many of the tools for ranking and presenting blogs. So:

Popular posts this week
Days after the release of OSRA last week, I saw the optical chemistry structure recognition on the front page of my favorite Dutch /. equivalent, Tweakers.net, Duitsers leren computer chemische structuren herkennen, written by Ren Gerritsen. The article discusses…
Hey, the new Scientiae is up at Twices! So much to read that I’m glad the weekends alomst there….
For those of you who know me you are likely aware of the fact that I have worked at Advanced Chemistry Development, ACD/Labs, for over a decade, with the past few years as their Chief Science Officer. During that time I have had the privilege of working with…

PMR: presumably these have more links, hits than others.
PMR: The blogosphere covers a great spectrum. Factual reporting of chemicals in the news, chemical and pharma industry, personal views on what it’s like to be a grad student, postdoc, etc. Recent controversies. New chemical, informatics and otehr technology. How to create and manage chemical content on the web.
And conventional journalism. Here’s ChemBark (Paul Bracher). ChemBark focusses on comment, and in one very long thread he explored a recent appointment (hiring) at Princeton. This generated nearly 200 comments and debate including some from senior faculty who wouldn’t normally have entered the blogosphere. I shall not comment on the rights or wrong of the issue (I don’t have strong feelings) nor on the rights or wrongs of the type of debate (bordering on the gossip columns of newspapers of the sort we have lost of in the UK). The main point I want to make is that the whole of the debate has been openly readable and is (I hope) semi-preserved. That means that historians, linguists, rhetoriticians, have a wonderful view of issues in the early C21.

What follows is my analysis of the issues discussed in “The Floor is Yours” and “The Week in Preview.” I have closed both of those threads and directed visitors wishing to continue the discussion to come here. I think it makes the most sense if I start by reviewing the news and talking about the “professional” issues in play. I’ll conclude with a revoltingly pompous dissertation on ChemBark as a medium for chemical news and a venue for subsequent analysis and discussion.

PMR: ChemBark presents a long and argued apologia for one type of chemical blogger. If you’ve a few minutes read it and form your own conclusions.
So where are we? A very vibrant community (sic). Probably only Egon has a complete oversight of the blogosphere we all interact with some of the others. There have been analyses before – sorry not to quote them, I’m in a rush. I’d love to see a map.
The main thing that frustrates me is the lack of decent tools. We are getting there slowly. With InChI, RDF, CML, etc we are starting to see chemistry embedded in blog posts. But the technology has certainly stopped me writing as much about chemistry as I would want to have done. Wikipedia has a set of guidelines : Wikipedia:WikiProject Chemistry/Structure drawing and has an activity on drawing molecules (Wikipedia talk:WikiProjectChemistry/Structure drawing workgroup) – when these technical barriers are overcome Wikipedia and the blogosphere will work very well together.
So given all the things above, I’d like to see a scientific discipline that has done more and better!

Posted in "virtual communities", chemistry | 3 Comments

The birth of a movement

From Peter Suber’s blog. An account of the roots of the Open Access declarations:
Bruce Byfield, Academia’s Open Access movement mirrors FOSS community, Linux.com, August 2, 2007. Excerpt:

Free and open source software (FOSS) has roots in the ideals of academic freedom and the unimpeded exchange of information. In the last five years, the concepts have come full circle, with FOSS serving as a model for Open Access (OA), a movement within academia to promote unrestricted access to scholarly material for both researchers and the general public.
“The philosophy is so similar that when we saw the success that open source was having, it served as a guiding light to us,” says Melissa Hagemann, program manager for Open Access initiatives at the Open Society Institute, a private foundation for promoting democratic and accessible reform at all levels of society. Not only the philosophy, but also the history, the need to generate new business models, the potential empowerment of users, the impact on developing nations, and resistance to the movement make OA a near twin of FOSS….
By the start of the millennium, a number of academic groups were starting to see the Internet as a solution to these problems. In December 2001, 13 representatives of these groups met in Budapest to organize. They produced a document called the Budapest Open Access Initiative. Starting with the declaration that “An old tradition and a new technology have converged to make possible an unprecedented public good,” the initiative called for making all academic articles available online. It was followed in April 2003 by the Bethseda Statement on Open Access Publishing, and in October 2003 by the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities, both of which suggest how OA could be implemented. Together these three statements — sometimes known collectively as the BBB Declaration — provide the practical and philosophical basis for the development of OA.
As with FOSS, the initial reaction to OA was derisive. “They laughed at us,” says Hagemann, who is one of the original signers of the Budapest Initiative. “During presentations I would give in various countries, I would be laughed at.” At first, the movement’s representatives even had trouble gaining membership in the Association of Learned Professional Societies, which issued a news release sharply criticizing OA. Just as with FOSS, these criticisms included claims that OA lacked a business model and was unsustainable. Other criticisms included the claim that OA amounted to vanity publishing and would lack peer reviews, both of which have proved unsubstantiated in practice.
And, in another parallel to FOSS, as OA has spread, so resistance has spread to government lobbying and even threats of lawsuits in some instances. Hagemann alleges that the American National Institute of Health, for instance, has had its implementation of OA delayed through the intervention of Congress and Senate members listening to the publishers’ lobbying groups. Similarly, she says the Association of American Publishers has recently hired a Washington lobbyist to campaign against OA.
“They used to laugh at us, but now they’re taking us very seriously and working against us politically,” notes Leslie Chan, a senior lecturer at the University of Toronto and one of the organizers of online publisher Bioline, as well as an original signatory of the Budapest Initiative. His comments echo a quote by Gandhi often heard in FOSS circles as well: “First they ignore you, then they laugh at you, then they fight you, then you win.” …

PMR: Several points. The absolute need for declarations and clear language. If you use fuzzy language you end up with a mess. One of the many things that Richard Stallman (initiator of the Free software movement) did was to make it clear what he was talking about.  For example I recently offered some code with the phrase

“The CIF2CML software is Open – I am not sure whether it’s on Sourceforge …”.

Everyone knows that all software on SF has to have an OSI approved licence. To which one collaborator replied
“Open or free? If it is just Open, I should even not take a look at any line of code from it.”
which is clear. He is not interested in software unless it asserts software Freedom as defined by Stallman. I don’t share the same position but I understand and respect it.
In the same way I understand and respect those Open Access advocates who feel that free-to-read and free-to-archive is sufficient. I happen to have a different position – unless it is free-to-reuse  “I should even not take a look at any line of [text] from it.”
OA still has a way to go in clarifying its access. We need labels as well as declarations. In a sense the political battlelines are clearly defined (it’s a pity to suggest that this is a battle, but it often looks like one). “OA” has arrived as a political force, so great congratulations to  Melissa and others at the OSI. (Confusingly this is the same acronym as for the Open Source Initiative which certifies licences – and completely different from the Open Source Institute and the Open Source Software Institute). One of the main things to fight is fudge and obfuscation. It has been clear on this blog how imprecise the language of the large publishers has been – maybe it is just “don’t care”, maybe it’s deliberate.
And we need the same for Data. At the BlueObelisk we will continue to chant our mantra ” Open Source, Open Standards and Open Data “. And promote the Open Knowledge Foundation whose licence we have adopted for CrystalEye. That’s why the next step, tedious though it is, is to formalize the language.
We are probably still in the “being ignored” phase – or being laughed at – “how can a group of amateur software  developers create anything useful?”. When it gets to the fighting bit we’ll need support. And we are quietly gathering it.

Posted in open issues | Leave a comment

Molecules in Wikipedia and RDF

I’ve just arrived in Mountain View for the Nature Google O’Reilly Foo Camp. Expect either silence (i.e. swamped) or gushings. I doubt I’ll have much time to blog. To keep readers happy here’s Egon making things happen in the world of RDF molecules.

Molecules in Wikipedia

I do not care about physical and chemical properties in Wikipedia, as I can easily extract them from other sources. The main value of Wikipedia for molecules is, I think, that it describes the history of a molecule. Additionally, the Wikipedia URL is a nice unique molecular identifier (for example http://en.wikipedia.org/wiki/Lactose) given certain conditions, and many bloggers are using it as such. But, it only is a useful identifier if one (and only one) InChI is stated on the wiki page.
Now that I am RDF-ing molecular space, I was again interested in dbpedia, a RDF version of Wikipedia. See these two blog items and Peter’s very nice dbpedia, RDF and SPARQL – for chemistry item. Christian is picking this up, and extending dbpedia for support for the various chemical boxes.
Wikipedia Templates
I have spotted a couple of templates: Drugbox, Chembox, Chembox new, of which the last one seems to most recent, and has extensions for explosives and drugs. The WikiProject Chemicals does not mention it though. Anyone who knows the status? Is chembox new the way forward and going to replace the older chembox? I hope so, because only the newer one has InChI in the last of official fields. Or is chembox new simply an extension of chembox itself?
Somewhere between 1000 and 1500 entries use the chembox new and another 1000 to 1500 use chembox but I assume there is considerable overlap. Additionally, Christian noted that there still seem to be molecules in Wikipedia which do not use a template at all, and counted some 1900 molecules using various lists. If you you want to keep a more close eye on chemistry in dbpedia, you should register to the dbpedia-discussion mailing list.

I think we are getting to critical mass in WP-Chem. There is a lot of good material and the templates/boxes are beginning to get formalised. Then it is fairly trivial to add more properties. (I disagree slightly with Egon – I think WP is the first place I would go to for properties of common substances.)
Here’s what you get if you search for the boiling point of barium. (I thank Jonathan Goodman for this adventure. The guild of barium boilers has been active over the centuries and given us – in Google order):
==========NobleMind (“more facts than you can handle”) ==


[ Science / Chemistry / Chemical Elements ]

 
The boiling point of Barium is 1640 º C

================WebElements================

Physical properties of barium

View… Cityscape Cylinders Line Balls Bar Intensity Virtual reality QuickDraw 3D
Melting point [/K]: 1000 [or 727 °C (1341 °F)]
View… Cityscape Cylinders Line Balls Bar Intensity Virtual reality QuickDraw 3D
Boiling point [/K]: 2143 [or 1870 °C (3398 °F)] (liquid range: 1143 K)

============ www.chemicalelements.com===========

Basic Information


Name: Barium
Symbol: Ba
Atomic Number: 56
Atomic Mass: 137.327 amu
Melting Point: 725.0 °C (998.15 K, 1337.0 °F)
Boiling Point: 1140.0 °C (1413.15 K, 2084.0 °F)
Number of Protons/Electrons: 56
Number of Neutrons: 81
Classification: Alkaline Earth
Crystal Structure: Cubic
Density @ 293 K: 3.51 g/cm3
Color: Silver
=========Periodic Table of Elements: Barium – Ba (EnvironmentalChemistry.com)======

Physical Properties of Barium

PMR: We’re not doing badly. Boiling barium is an eclectic art and it would never do to let outsiders in on the process. A range of 700 degrees is reasonable obfuscation.
So what does Wikipedia say? Is it authoritative? or does it simply copy the stuff above? I don’t know – it is meant to cite its references. It mentions Webelements, but it doesn’t have their value… it goes with the environmentalists

   
   
   
   
   
   
   
Boiling point 2170 K
(1897 °C, 3447 °F)

x
x
x
x
x
x
x
x
x
========================================================
So where would I look for authority? The NIST Webbook:
Phase change data
Go To: Top, References, Notes / Error Report
Data compilation copyright
by the U.S. Secretary of Commerce on behalf of the U.S.A.
All rights reserved.
Quantity Value Units Method Reference Comment
Tboil 1913. K N/A Strem Chemicals, 1999 dendritic phase
Tboil 1913. K N/A Strem Chemicals, 1999
======================================
And – read carefully – this supports ONE of the values above. Which?

Posted in chemistry | 1 Comment

legacy molecules: 2-fooyl-ethanol?

More thoughts on recovering legacy molecules – this time from names.
In journals and elsewhere we frequently come across lists like:

  • 2-chloro-ethanol ClCH2CH2OH
  • 2-fluoro-ethanol FCH2CH2OH
  • 2-phenyl-ethanol C6H5CH2CH2OH

These all have a generic representation with “R” groups:

  • RCH2CH2OH

Is there a generic name for these? Better than “2-substituted-ethanols” which doesn’t parse very nicely.
Would 2-foooyl-ethanol work? (probably too close to 2-furyl-ethanol). We could do with a simple, distinctively spelt and pronounced free variable for a generic group. That could be a useful tool in representating generic classes and also in name2structure.

Posted in chemistry | 6 Comments

OSRA and others; how to retrieve legacy molecular structures

Egon Willighagen blogged this. There is now a real opportunity for the Open Source chemistry community to create high-quality tools for the extraction of molecular information from legacy documents. Besides full-text articles other good areas to look are probably theses and supplemental data.
Before I copy the post, I’ll review the methods available (to the Open Source community)

  • explicit connection table. This is the best, but rare. It might occur in theses, but is uncommon. (Some word documents include binary CDX and/or MDL files but this is an awful hack. I’ve done it and don’t recommend it)
  • Implicit connection table. PLEASE USE InChI! In the absence of this there might be a SMILES
  • crystal structure. This is very good and uses CIF2CML. see CrystalEye (http://wwmm.ch.cam.ac.uk). Crystal structure coordinates are often reported in theses and supplemental data
  • output of computational chemistry programs. Again very good and uses CIF2CML code.
  • Chemical name. Parsable by OPSIN (part of the OSCAR3 package). Probably runs at between 25% and 70% depending on the domain. Will be improved by lots of little incremental bits (see below).
  • Spectra data. Very variable and usually incomplete. Works for small molecules. Use SENECA or lookup against shifts in NMRShiftDB. Very useful to check structures created by other methods
  • Chemical structure diagram. This is what is discussed below. Remember that although it’s easy for a human to understand a picture it can be very difficult for a machine. We can divide it into three parts (a) turn a bitmap into a series of graphics primitives (lines, text) (b) turn the graphics primitives into chemical primitives (bonds, atoms, labels). The first can be very hard, especially for fuzzy diagrams. The second is much easier, especially when the first has worked well. It is well suited when the input is PDF which although disgusting and horrendous can reveal the graphics primitives. I have done this for several instances of supplemental data and it’s variable. With an increasing amount of diagrams munged into PDF the vectors are often captured well. The third depends on the chemical semantics. Much of it involves recognising conventions (e.g. what does “OBz” mean?). I’m hopeful

In both names, spectra and diagrams there is a lot of heuristics and this is where everyone can help. There are probably a few hundred abbreviations, groups, etc. in common use and enough to give us a high degree of success. If we all add a few of these we can make rapid progress. You don’t have to be a programmer to do it.
Also, as Egon says, the combination of the methods will help a lot. What’s “THF”? It could be tetrahydrofuran or tetrahydrofolate. If you know the formula is C4H8O you know it’s the second. If you know it’s got two fused six-rings in, even if you can work out the atoms, it’s clearly not the second. And so on.
Enough from me:

OSRA: GPL-ed molecule drawing to SMILES convertor

Igor wrote a message to the CCL mailing list about OSRA:
    We would like to announce a new addition to the set of chemoinformatics tools available from the Computer-Aided Drug Design Group at the NCI-Frederick. OSRA is a utility designed to convert graphical representations of chemical structures, such as they appear in journal articles, patent documents, textbooks, trade magazines etc., into SMILES.OSRA can read a document in any of the over 90 graphical formats parseable by ImageMagick (GIF, JPEG, PNG, TIFF, PDF, PS etc.) and generate the SMILES representation of the molecular structure images encountered within that document.

The email does not give any information on the fail rate, but the demo they provide via the webinterface does show some minor glitches (the bromine is not recognized):

The source reuses OpenBabel and uses the GPL license. The value equal to that of text mining tools like OSCAR3, and together they sounds like the Jordan and Pippen of mining chemical literature.

4 comments:

Rich Apodaca said…
Great find – thanks, Egon!
Joerg Kurt Wegner said…
I posted about it yesterday not knowing that you have already posted it. That’s funny! I found it in my del.ico.us network and you via CCL … so the social network seems to work 😉
Egon Willighagen said…
Joerg, I am officially on holiday, but reading my email… so, missed the del.ico.us trigger…Interesting that you meantion the CCL mailing list as social network… to me, social networks were more like being able to socialize with accounts outside my main areas of interest, which CCL would be…
Antony said…
I did some testing on this the day it was released and found a number of issues during the tests and blogged about it here http://www.chemspider.com/blog/?p=83However, as a first release it definitely has potential and I am looking forward to helping them

… and, whether or not it’s usable directly in other code we should be able to abstract much of the functionality into code-independent data files

Posted in chemistry | Leave a comment

Request for Open publication of crystallographic data in Elsevier's Tetrahedron

=========== Open letter to editors of Tetrahedron ==========
Professor L. Ghosez ,
Professor Lin Guo-Qiang ,
Professor T. Lectka ,
Professor S.F. Martin ,
Professor W.B. Motherwell ,
Professor R.J.K. Taylor ,
Professor K. Tomioka
Subj: Request for Open publication of crystallographic data in Tetrahedron
Dear editors,
I have recently been reviewing access to supplemental data in chemistry publications, in particular crystallographic data (“CIFs”). Many publishers (IUCr, RSC, ACS…) expose these on their websites as Open Data (for examples see: http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=455). The data are acknowledged not to be copyrightable (see http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=447) where your colleague Jennifer Jones (copied) has confirmed:

Dear Peter Murray-Rust
Thanks for your email.  Data is not copyrighted.  If you are reusing the entire presentation of the data, then you have to seek permission, otherwise, you can use the data without seeking our permission.
Yours sincerely
Jennifer Jones
Rights Assistant
Global Rights Department
Elsevier Ltd
PO Box 800
Oxford OX5 1GB
UK
Tel: + 44 (1) 865 843830
Fax: +44 (1) 865 853333
email: j.jones@elsevier.com

Other Elsevier journals such as those publishing thermochemistry (see last blog post)  are now actively making the supplemental data Openly available on the journal website. I am therefore asking whether Tetrahedron (and perhaps other Elsevier chemistry journals) might consider publishing their data Openly in this way and would be grateful for your views.
(This is an Open letter (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=456) and I would like to publish your reply so please mark any confidential material as such).
Thank you for considering this
Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road,  Cambridge CB2 1EW, UK
+44-1223-763069
=========== Open letter to editors of Tetrahedron ==========

Posted in open issues | Leave a comment

cyberscience: Changing the business model for access to data

I have been reviewing the availability of Open Data for cyberscience – concentrating recently on crystallography and chemical spectra as examples. I’ll propose a new business model here, still very ill-formed and I welcome comments. It applies particularly to disciplines where the data are collected in a fragmented manner rather than being coordinated as in, for example, survey of the earth or sky. I call this fragmentation “hypopublication”.
However the Internet has the power to pull together this fragmentation if the following conditions are met:

  • the data are fully Open and exposed. There must be no cost, no impediment to access, no registration (even if free), no forms to fill in.
  • the data must conform to a published standard and the software to manage that standard must be Openly available (almost necessarily Open Source). The metadata should be Open.
  • the exposing sites must be robot-friendly (and in return the robots should be courteous).

Such a state nearly exists in modern crystallography. The situation for macromolecules is that authors are required to deposit data in a central repository (http://www.rcsb.org). For small molecules there is less Open Data but a significant amount is available because of the work put in by:

  • the International Union of Crystallography (IUCr), which for at least 30 years has pioneered the development of data standards and ontologies emerging in its current Crystallographic Information File specification.
  • a number of publishers who have Openly exposed CIF data files on their websites for every article which contains relevant crystallography. They include the IUCr itself, the Royal Society of Chemistry, the American Chemical Society, the Chemical Society of Japan, and the American Mineralogist. (There may be others – if so I apologize and ask them to come forward). The licences are occasionally a bit fuzzy but the spirit and intention is clear. The data are there as a scientific record and to be re-used.
  • The Crystallography Open Database – a volunteer activity which has aggregated approximately 50 K CIFs from donations.

The Internet now means that the data can be reliably aggregated as in our Crystaleye knowledgebase. This also acts as an immediate alerting system – as soon as a new piece of interesting crystallography is published, subscribers to our RSS feeds are notified immediately.
The criticism is sometimes made that unless data is inspected by humans it cannot be certified as fit for purpose. This depends entirely what the purpose is. It’s often better to have data of variable quality than no data at all. And it’s always better to have data of variable KNOWN quality rather than none, even if the quality is often known to be low. It’s a balance of precision and recall (Why 100% is never achievable). Joe Townsend here has shown in his PhD that if we lower the recall of crystallographic data (i.e. throw out everything that is known to have errors) we can get very high precision indeed without having to inspect the data.
Our remaining problem is that not all publishers expose the data Openly. The rest of this post explores why they should think of doing so.
Before the Internet it was necessary to have central repositories to put data in, but now with all publishers online the data can just as easily be posted on their sites. Even if there is no intrinsic search mechanism on the publisher sites, researchers like Nick Day (here) can create tools for managing the data and metadata in CrystalEye. So why don’t all publishers expose their crystallography – I think it’s just a matter of priorities and hope this post will advance the case.
Data costs money. True, but the amount is falling. I don’t know how much it costs the publishers above to manage the exposure of the crystallography files – and I’m not asking – but it’s obviously not prohibitive. They’ve done it (I assume) because they think it’s an important part of the publication process – allowing science to be verified, providing a record, allowing new research to build on old. So they have – presumably – included the cost within the general cost of publication (which is covered mainly by subscriptions but for some of the articles also paid-by-author/funder Open Access).
The main cost of the process – the creation of communal metadata – is already past. This is probably the largest barrier to any group trying to emulate the idea. But it’s also happening in thermochemistry (ThermoML) where a number of journals:

Journal of Chemical & Engineering Data (Elsevier)
The Journal of Chemical Thermodynamics (Elsevier)
Fluid Phase Equilibria (Elsevier)
Thermochimica Acta (Elsevier)
International Journal of Thermophysics (Springer)

all require data to be published at source and made Openly available. Here’s a sample issue which lists the Open data:
==================================

ThermoML Data for The Journal of Chemical Thermodynamics, Vol. 39, No. 6 June 2007
Developed in cooperation between The Journal of Chemical Thermodynamics and the Thermodynamics Research Center (TRC)
The full Table of Contents for this issue is available from JCT. The numbers below correspond to the numbers in the full Table of Contents.


2.
Low pressure solubility and thermodynamics of solvation of oxygen, carbon dioxide, and carbon monoxide in fluorinated liquids
Pages 847-854
J. Deschamps, D.-H. Menz, A.A.H. Padua and M.F. Costa Gomes
ThermoML Data (To download: right-click on link and select “Save Link Target As” )

3.
High pressure phase behaviour of the binary mixture for the 2-hydroxyethyl methacrylate, 2-hydroxypropyl acrylate, and 2-hydroxypropyl methacrylate in supercritical carbon dioxide
Pages 855-861
Hun-Soo Byun and Min-Yong Choi
ThermoML Data (To download: right-click on link and select “Save Link Target As” )

===================================
You’ll see that the data are Open.
So couldn’t this be a model for all of science? As I have posted recently I’m going to write to the editors of Elsevier’s Tetrahedron suggesting that they make all their crystallographic data available Openly. They agree it’s not their copyright, so it’s just a question of how to do it – files on a website shouldn’t be a major expense.
And funders should encourage this. If you are urging authors and journals to publish Open full-text, please extend this to data. Yes, there are some technical difficulties in some cases such as metadata, complexity and size but they probably aren’t too scary. And in any case the community will help work out how to use them.

Posted in cyberscience, open issues | 1 Comment

Request for clarification of copyright and re-use on CIFs from Elsevier/CCDC

==== copy of letter to CCDC requesting clarification on copyright ====
To:data_request@ccdc.cam.ac.uk
Greetings
(Sorry to use a generic address but I am not sure who is the person to contact about permissions).
We have a systematic program of carrying out quantum mechanics calculations on organic crystal structures which uses the original CIFs as deposited by authors of peer-reviewed publications. In some cases the CIFs are openly accessible and openly re-usable from the publisher’s website (e.g. Acta Cryst., RSC). In other cases (e.g. Elsevier) the CIFs have been deposited at CCDC and are requestable without charge. and we have started to do this (see http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=452).
I would be grateful if you could clarify the copyright and re-use position of the retrieved files. Both your website and the actual file carry a notice which suggests that the files may be copyrighted and there are also apparently restrictions on re-use. To quote:

Conditions of Use of CIFs provided from the CCDC CIF archive Individual CIF data sets are provided freely by the CCDC on the understanding that they are used for bona fide research purposes only. They may contain copyright material of the CCDC or of third parties, and may not be copied or further disseminated in any form, whether machine-readable or not, except for the purpose of generating routine backup copies on your local computer system.
If you agree to the foregoing terms and conditions then please click on the “Accept” button below. If you do not accept the foregoing terms and conditions you should not click on the “Accept” button but should click on the “Do NOT accept” button below or the “back” button on your browser.

Elsevier (and other major publishers) have confirmed that these files are data and therefore not copyrightable.
You will appreciate that an adherence to formal wording of this licence could prevent proper scientific work being carried out. For example we routinely make all our raw data Openly available so that people can repeat our work (and have deposited 250,000 molecular structures and calculations in our Institutional Repository). Could you please confirm that the CIFs are not, in fact, copyrighted and that we have the right to re-use them in an Open manner and to redistribute them. We will provide complete provenance so that the authors’ identities (and where possible the article alongside which they were published) will be made clear.
Many thanks
Peter
NOTE: This letter is published to my blog: http://wwmm.ch.cam.ac.uk/blogs/murrayrust
( probably as http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=454 ) where we have been able to clarify licences on several publisher’s web sites. I would like to publish your reply in the same way (so please indicate if there is any material which should not be made public).
Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road, Cambridge CB2 1EW, UK
+44-1223-763069

=============================================
Posted in data, open issues | Leave a comment

Republican vision of Open Source Research

from Peter Suber’s blog:


Tommy Thompson, Republican presidential candidate and former Secretary of the Department of Health and Human Services, has announced his science platform: double the budget of the NIH (to $58 billion/year), cure breast cancer in 10 years, and this:

Create an open source research community on the Internet where research can be organized and discussions can be conducted with experts. This online community will be a centralized repository for research where all of the world’s people can contribute their time, money or expertise toward helping with this global fight.

PS: I can’t find anything in his press release or campaign site to explain what he means by this.

PMR: This seems amazing. I am an illiterate in US politics but it’s gratifying to know that Open Source Research has a high-enough profile that it is taken seriously at political level. It’s clear that we should be able to keep pushing politicians on this issue. It’s good for science, good for humanity. If we are going to save the planet we need Open Science as well as cycling to work. People and organizations who hide science and hide data through inertia or private gain are going to have an increasingly difficult position justifying their actions to the community.
We should remember that the US managed the moon-shot and also built the National Cancer Institute to cure cancer. The physical world seems to be easier than the biological world but I am certain of one thing: we shall need all the shared infromation we can get. Publishers, information companies, pharma – think about different ways of doing things. The answer may already be out there.

Posted in data, open issues | 3 Comments

cyberscience: extracting crystallography from Elsevier's Tetrahedron via CCDC

Regular readers will know of our Crystaleye repository where Nick Day’s robots have – quite legally – extracted ca 100,000 crystal structures from the Open AND closed literature. However it is not yet comprehensive as some publishers do not expose their data Openly but have a deposition arrangement with the Cambridge Crystallographic Data Centre (CCDC) [No formal relationship to us, although 100 metres away and built by the same architect]. So we are now looking to extract these structures into Crystaleye.
As I blogged recently (THANK YOU ELSEVIER!) Elsevier have no objection to the extraction of crystallographic data from their journals. The first journal I’m starting with is Tetrahedron [1]. Here I take you through the process from first the author’s point of view and then the reader’s/user’s. From the Guide for authors (and omitting large amounts of unexceptionable material) we have:

X-ray crystallographic data: Prior to submission of the manuscript, the author should deposit crystallographic data for organic and metalorganic structures with the Cambridge Crystallographic Data Centre. The data, without structure factors, should be sent by e-mail to deposit@ccdc.cam.ac.uk, as an ASCII file, preferably in CIF format. Hard copy data should be sent to CCDC, 12 Union Road, Cambridge CB2 1EZ. A checklist of data items for deposition can be obtained from the CCDC Home Page on the World Wide Web (External link http://www.ccdc.cam.ac.uk) or by e-mail to: fileserv@ccdc.cam.ac.uk, with the one-line message, send me checklist. The data will be acknowledged, within three working days, with one CCDC deposition number per structure deposited. These numbers should be included with the following standard text in the manuscript: Crystallographic data (excluding structure factors) for the structures in this paper have been deposited with the Cambridge Crystallographic Data Centre as supplementary publication nos. CCDC. Copies of the data can be obtained, free of charge, on application to CCDC, 12 Union Road, Cambridge CB2 1EZ, UK, (fax: +44-(0)1223-336033 or e-mail: deposit@ccdc.cam.ac.Uk). Deposited data may be accessed by the journal and checked as part of the refereeing process. If data are revised prior to publication, a replacement file should be sent to CCDC.

PMR: Relatively simple – if I want the crystallographic data from a structure it can be retrieved from CCDC. Let’s see it from the reader’s point of view. Although Tetrahedron is closed, there is a Free-to-view issue which includes:

Calix[4]azacrowns: self-assembly and effect of chain length and O-alkylation on their metal ion-binding properties
Pages 62-70
Issam Oueslati, Pierre Thuéry, Oleksandr Shkurenko, Kinga Suwinska, Jack M. Harrowfield, Rym Abidi and Jacques Vicens
SummaryPlus | Full Text + Links | PDF (1255 K)

PMR: Now all of you should be able to click along with me since the Full Text is Free-to-read… and we find towards the end…

5.3. X-ray crystal data for 1 and 4
Crystal data and refinement details for 1·CH3CN·CH3OH. C53H71N3O7, M=862.13, monoclinic, space group C2/c, a=35.499(1), b=11.8598(2), c=25.6668(8) Å, β=15.288(1), V=9770.5(4) Å3, Z=8, Dc=1.172 g cm−3, μ=0.077 mm−1, F(000)=3728. Refinement of 601 parameters on 6923 independent reflections out of 42440 measured reflections (Rint=0.037) led to R=0.081, wR=0.192, and S=1.13. Crystal data and refinement details for 4·CH3CN·CHCl3. C54H70Cl3N3O6, M=963.48, orthorhombic, space group Pna21, a=12.6378(2), b=33.1849(13), c=12.6464(5) Å, V=5303.7(3) Å3, Z=4, Dc=1.207 g cm−3, μ=0.223 mm−1, F(000)=2056. Refinement of 608 parameters on 7285 independent reflections out of 25830 measured reflections (Rint=0.054) led to R=0.056, wR=0.157, and S=1.07. Crystallographic data for the structures of 1 and 4 have been deposited with the Cambridge Crystallographic Data Centre as supplementary publication nos. CCDC 621387 and CCDC 621388. Copies of data can be obtained, free of charge, on application to CCDC, 12 Union Road, Cambridge CB2 1EZ, UK.

PMR: ( 1 and 4 signify compounds 1 and 4 in the main text  most authors and publishers use a unique numbering scheme within the paper.) So I can apply to CCDC for the structures which should be free. Will they be Open? Here’s what we have to do: On the Request Structure page:

Since 1994, under official deposition arrangements with a number of journals, the Cambridge Crystallographic Data Centre (CCDC) has provided copies of the supplementary data of individual published structures for bona fide research purposes. Data from before 1994 are currently only available from the distributed Cambridge Structural Database (CSD).
Supplementary data arriving at the CCDC electronically in CIF format whether as part of journal deposition arrangements or directly from individuals are held on trust in the CCDC Supplementary Data Archive on behalf of those journals and individuals. After publication, these data are converted into CSD entries by the addition of bibliographic and chemical text, chemical structural data, and the results of crystal structure validation.
In January 2002 CCDC provided a web form for data retrieval, which requires you to enter brief literature citation details and the CCDC Deposition Number (CCDCnnnnnn) which should appear in the paper.
This free service permits rapid access to supplementary CIF data for bona fide research purposes. The complete Cambridge Structural Database containing fully validated information may also be available within your institution or department.

PMR: and now the conditions…

Conditions of Use of CIFs provided from the CCDC CIF archive Individual CIF data sets are provided freely by the CCDC on the understanding that they are used for bona fide research purposes only. They may contain copyright material of the CCDC or of third parties, and may not be copied or further disseminated in any form, whether machine-readable or not, except for the purpose of generating routine backup copies on your local computer system.
If you agree to the foregoing terms and conditions then please click on the “Accept” button below. If you do not accept the foregoing terms and conditions you should not click on the “Accept” button but should click on the “Do NOT accept” button below or the “back” button on your browser.

PMR: This doesn’t look crystal clear. “They may contain copyright material of the CCDC or of third parties”. A very fuzzy statement. They may only be used for “bona fide research purposes”. This is an unclear phrase. “may not be copied or further disseminated in any form, whether machine-readable or not”. This is fairly clear. The user has very few rights if any. Anyway I “agree” the conditions for once and find the form:

   
Your Name:
Your Email: (see the CCDC privacy policy)
Your Institution:
Deposition Number(s)
for one paper
:
(e.g. one: 217777
more than one: 217777 218383
range: 218383-218386
other: 1220/32 or wn6031)
Journal: AAPS PharmSciTech Acc.Chem.Res. ACGC Chem.Res.Comm. ACGC Chem.Res.Commun. ACH-Models Chem. ACS Sym.Ser. ACS,Abstr.Papers(Summer) Acta Biochim.Pol. Acta Chem.Scand. Acta Chim.Slov. Acta Crystallogr. Acta Crystallogr.,Sect.A:Found Crystallogr. Acta Crystallogr.,Sect.B:Struct.Crystallogr.Cryst.Chem. Acta Crystallogr.,Sect.B:Struct.Sci. Acta Crystallogr.,Sect.C:Cryst.Struct.Commun. Acta Crystallogr.,Sect.D:Biol.Crystallogr. Acta Crystallogr.,Sect.E:Struct.Rep.Online Acta Pharm.Hung. Acta Phys.Pol.,A Acta Phys.Sin. Acta Pol.Pharm. Acta Polym. Acta Univ.Palacki.Olomuc. Adv.Funct.Mater. Adv.Mat.Optics Elect. Adv.Mater. Adv.Sci.Technol.2003 Adv.Synth.Catal. Advances in Molecular Structure Research Amino Acids An.Asoc.Quim.Argent. An.Assoc.Bras.Quim. An.Quim. Anal.Chem. Anal.Chim.Acta Anal.Sci. Anal.Sci.:X-Ray Struct.Anal.Online Angew.Chem.,Int.Ed. Anhui Gongye Daxue Xuebao Anhui Shifan Dax.Xue.,Zir.Kex. Ann.Chim.(Paris) Ann.Phys.(Leipzig) Anti-Cancer Drug Des. Antimicrob.Agents Chemother. Antiviral Chem.Chemother. Appl.Catal.,A Appl.Magn.Reson. Appl.Organomet.Chem. Appl.Phys.Lett. Appl.Radiat.Isot. Arch.Biochem.Biophys. Arch.Pharm. ARKIVOC Asian J.Chem. Atti Accad.Sci.Torino Aust.J.Chem. Beijing Dax.Xue.,Zir.Kex. Beijing Huagong Dax.Xue.Zir.Kex. Beijing Ligong Daxue Xuebao Beijing Shifan Daxue Xuebao,Ziran Kexueban Beilstein J.Org.Chem. Ber.Bunsenges.Phys.Chem. Biochem.Biophys.Res.Comm. Biochemical Systematics and Ecology Biochemistry Biochim.Biophys.Acta Bioconjugate Chem. Bioelectrochemistry Bioinorg.Chem.Appl. Biomacromolecules BioMetals Bioorg.Chem. Bioorg.Khim. Bioorg.Med.Chem. Bioorg.Med.Chem.Lett. Biopolymers Biosci.Biotechnol.Biochem. Biosens.Bioelectron. Bol.Soc.Chil.Quim. Bol.Soc.Quim.Peru Braz.J.Med.Biol.Res. Braz.J.Phys. Bull.Acad.Pol.Sci.,Ser.Sci.Chim. Bull.Chem.Soc.Ethiop. Bull.Chem.Soc.Jpn. Bull.Korean Chem.Soc. Bull.Materials Science Bull.Pol.Acad.Sci.,Chem. Bull.Res.Lab.Nucl.Reactor Bull.Soc.Chim.Belg. Bull.Soc.Chim.Fr. C.R.Acad.Sci. C.R.Acad.Sci.,Ser.IIc:Chim. C.R.Chim. Can.J.Chem. Canadian J.Analytical Sci.Spectroscopy Cancer Biotherapy and Radiopharmaceuticals Carbohydr.Lett. Carbohydr.Res. Carbohydrate Polymers Carcinogenesis Catal.Lett Catal.Today Central Eur.J.Chem. Challenges Coord.Chem.New Century Chem.-Eur.J. Chem.Asian J. Chem.Ber. Chem.Biodiversity Chem.Biol. Chem.Biol.Drug.Des. Chem.Cent.J. Chem.Commun. Chem.Eng.Sci. Chem.Ind.(London) Chem.J.Internet Chem.Lett. Chem.Mater. Chem.Papers Chem.Pharm.Bull. Chem.Phys. Chem.Phys.Lett. Chem.Phys.Lipids Chem.Rec. Chem.Res.Chin.Univ. Chem.Res.Toxicol. Chem.Soc.Rev. Chem.Vap.Deposition Chem.Zvesti ChemBioChem Chemical Reviews ChemMedChem Chemosphere ChemPhysChem Chimia Chin.Chem.Lett. Chin.J.Chem. Chin.J.Chem.Phys. Chin.J.Polym.Sci. Chin.Sci.Bull. Chirality Ciencia UANL Collect.Czech.Chem.Commun. Colloid Polym.Sci. Color.Technol. Comput.Biol.Chem. Conference Contemporary Boron Chem.,Proc.10th Int.Conf. Coord.Chem.Rev. Croat.Chem.Acta Cryst.Growth Des. Cryst.Res.Technol. Cryst.Rev. Crystal Engineering CrystEngComm Cuihua Xuebao Curr.Appl.Phys. Curr.Drug Metab. Curr.Sci. Dalton Trans. Des.Monomers Polym. Doga:Turk.J.Chem. Dok.Chem. Dokl.Akad.Nauk Belarusi Dokl.Ak
ad.Nauk SSSR Dongbei Shi-Daxuebao Ziran Kexueban Dop.Ak.Nauk Ukr.SSR Ser.B-Geol.Khim.Biol.Nauki Dop.Akad.Nauk.Ukr.SSR,Ser.A-Fiz.-Mat.Tek.Nauki Drug Des.Discovery Dyes Pigm. e-Repository Electroanalysis Electrochimica Acta Electron.J.Environ.Agric.Food Chem.(EJEAFChe) Electronic Conf.Heterocyclic Chemistry Elektrokhim. Enantiomer Eur.Cryst.Meeting Eur.J.Biochem. Eur.J.Inorg.Chem. Eur.J.Med.Chem. Eur.J.Org.Chem. Eur.J.Pharm.Sci. Eur.J.Pharmacol. Eur.J.Solid State Inorg.Chem. Eur.Phys.J.B Eur.Polym.J. Faguang.Xuebao Faraday Discuss. Farmatsev.Zh. FEBS J. FEBS Lett. Fenxi Ceshi Xuebao Fenzi Kexue Xuebao Ferroelectrics First Electr.Conf.Trends in Org.Chem. Fitoterapia Forschungszent Rossendorf(Ber) Fresen.Z.Anal.Chem. Front.Biosci. Fud.Xue.Ziran Kex. Fullerenes,Nanotubes,Carbon Nanostruct. Functional Materials Fund.Applications Anion Sep. Gaodeng Xuexiao Huaxue Xuebao Gazz.Chim.Ital. Glas.Hem.Tech.Maked. Glycobiology Green Chemistry Guangpu Shiyanshi Guangpux.Yu Guang.Fenxi Guangxi Shifan Daxue Xuebao Ziran Kexueban Guilin Gongxueyuan Xuebao Hanneng Cailiao Hecheng Huaxue Heilongjiang Daxue Ziran Kexue Xuebao Helv.Chim.Acta Heteroat.Chem. Heterocycles Heterocyclic Communications Huaihai Gongxueyuan Xuebao Huaxue Shiji Huaxue Tongbao Huaxue Xuebao Huaxue Yanjiu Huaxue Yanjiu Yu Yingyong Hunan Shifan Daxue,Ziran Kexue Xuebao Huozhayao Xuebao Hyperfine Interactions Il Farmaco Ind.Eng.Chem.Res. Ind.J.Heterocycl.Chem. Indian J.Chem. Indian J.Chem.,Sect.A:Inorg.,Bio-inorg.,Phys.,Theor.Anal.Chem. Indian J.Chem.,Sect.B:Org.Chem.Incl.Med.Chem. Indian J.Phys. Indian J.Phys.,A Indian J.Pure Appl.Phys. Inorg.Chem. Inorg.Chem.Commun. Inorg.Chim.Acta Int.J.Biol.Macromol. Int.J.Inorg.Mat. Int.J.Mass Spectrom.Ion.Process. Int.J.Mol.Sci. Int.J.Pept.Protein Res. Int.J.Pharmaceutics Int.J.Quantum Chem. Int.J.Soc.Mater.Eng.Resour. International J.Radiation Biology Internet Electronic J.Mol.Design Internet J.Chem. Iran J.Chem.Chem.Eng. Iranian Journal of Science & Technology Isr.J.Chem. Izv.Akad.Nauk Kaz.SSR,Ser.Khim. Izv.Akad.Nauk SSSR,Ser.Fiz. Izv.Akad.Nauk SSSR,Ser.Khim. Izv.Akad.Nauk.Gruz.SSR,Ser.Khim. Izv.Minist.Nauki-Akad.Nauk Resp.Kaz,Ser.Khim. Izv.Timiriazevsk.S-KH.Akad. J. Electron Spectrosc. Relat. Phenom. J.Agric.Food Chem. J.Alloys Compd. J.Am.Chem.Soc. J.Am.Oil Chem.Soc J.Am.Soc.Mass.Spectrom. J.Antibiot. J.Appl.Crystallogr. J.Appl.Phys. J.Appl.Polym.Sci. J.Argent.Chem.Soc. J.Asian Nat.Prod.Res. J.Bangladesh Acad.Sci. J.Bangladesh Chem.Soc. J.Beijing Inst.Technol. J.Biochem.(Tokyo) J.Biol.Chem. J.Biol.Inorg.Chem.(JBIC) J.Biomol.Struct.Dyn. J.Braz.Chem.Soc. J.Carbohydr.Chem. J.Catalysis J.Chem.Cryst. J.Chem.Ecol. J.Chem.Educ. J.Chem.Inf.Model. J.Chem.Phys. J.Chem.Res. J.Chem.Sci.(Bangalore,India) J.Chem.Soc.,Dalton Trans. J.Chem.Soc.,Faraday Trans. J.Chem.Soc.,Perkin Trans.1 J.Chem.Soc.,Perkin Trans.2 J.Chem.Soc.A J.Chem.Soc.B J.Chem.Soc.D J.Chem.Soc.Pak. J.Chem.Theory Comput. J.Chil.Chem.Soc. J.Chim.Phys. J.Chin.Chem.Soc.(Taipei) J.Chinese Pharmaceutical Sciences J.Cluster Sci. J.Colloid Interface Sci. J.Combinatorial Chemistry J.Comput.-Aided Mol.Des. J.Coord.Chem. J.Cryst.Growth J.Crystallogr.Spectrosc.Res. J.Electroanal.Chem. J.Electrochem. Soc. J.Energ.Mater. J.Enzyme Inhib. J.Enzyme Inhib.Med.Chem. J.Essent.Oil Res. J.Ethnopharmacol. J.Fac.Sci.Technol.,Kinki Univ. J.Fluorescence J.Fluorine Chem. J.Hazard.Mater. J.Heterocycl.Chem. J.Imaging Sci.Technol. J.Inclusion Phenom.Macrocyclic Chem. J.Inclusion Phenom.Mol.Recog.Chem. J.Indian Chem.Soc. J.Inf.Recording J.Inorg.Biochem. J.Inorg.Organomet.Polym. J.Inorg.Organomet.Polym.Mater. J.Iran.Chem.Soc. J.Korean Chem.Soc. J.Korean Phys.Soc. J.Labelled Comp.Radiopharm. J.Lipid Res. J.Low Temp.Phys. J.Lumin. J.Macromol.Sci.,Phys. J.Macromol.Sci.,Pure Appl.Chem. J.Magn.Magn.Mater. J.Magn.Reson. J.Mater.Chem. J.Mater.Res. J.Mater.Sci. J.Med.Chem. J.Mex.Chem.Soc. J.Mol.Biol. J.Mol.Catal. J.Mol.Catal.A:Chem. J.Mol.Catal.B:Enzym. J.Mol.Liq. J.Mol.Model. J.Mol.Struct. J.Mol.Struct.:THEOCHEM J.Nat.Med. J.Nat.Prod. J.Nature J.Nucl.Med. J.Nucl.Radiochem.Sci. J.Nucl.Sci.Technol. J.Oleo Sci. J.Org.Chem. J.Organomet.Chem. J.Pept.Res. J.Pept.Sci. J.Pharm.Biomed.Anal. J.Pharm.Sci. J.Pharm.Soc.Korea J.Pharmacol.Sci. J.Photochem.Photobiol.,B J.Photochem.Photobiol.A:Chem. J.Phys.:Condens.Matter J.Phys.Chem. J.Phys.Chem.A J.Phys.Chem.B J.Phys.Chem.C J.Phys.Chem.Solids J.Phys.IV J.Phys.Org.Chem. J.Phys.Soc.Jpn. J.Polym.Sci. J.Polym.Sci.,Part A:Gen.Pap. J.Polym.Sci.,Part A:Polym.Chem. J.Polym.Sci.,Part B: Polym.Phys. J.Polym.Sci.,Polym.Lett.Ed. J.Porphyrins Phthalocyanines J.Prakt.Chem.-Chem.-Zeitung J.Rare Earths J.Sci.Technol.Tropics J.Sep.Sci. J.Serb.Chem.Soc. J.Soc.Alger.Chim. J.Soc.Chim.Tunis J.Soc.Chim.Tunisie J.Soc.Inorg.Mater.,Japan J.Sol-Gel Science and Technology J.Solid State Chem. J.Solid State Electrochem. J.Steroid Biochem. J.Steroid Biochem.Mol.Biol. J.Struct.Chem. J.Sulfur Chem. J.Supramol.Chem. J.Synchrotron Radiation J.Thermal Analysis and Calorimetry J.Undergrad.Chem.Res. Jiangxi Nongye Dax.Xuebao Jiegou Huaxue Jilin Dax.Xuebao,Lixueban Jilin Daxue Ziran Kex.Xue. Jilin Huag.Xuey.Xuebao Jingxi Huagong Jpn.J.Appl.Phys. Khim.Farm.Zh. Khim.Get.Soedin.,SSSR Khim.Interesakh Ustoich.Razvit. Khim.Komp.Model.Butlerov.Soobshchen. Khim.Prir.Soedin Koord.Khim. Korean J.Crystallogr. Kristallografiya Langmuir Lanzhou Dax.Xuebao,Zir.Kex. Laser Chem. Laser Phys.Lett. Latv.Khim.Z. Lett.Org.Chem. Lett.Pept.Sci. Liaoning Shifan Dax.Xue.Zir.Kex. Liebigs Ann. Liquid Crystals Macromol.Chem.Phys. Macromol.Rapid Commun. Macromol.Symp. Macromolecules Magn.Reson.Chem. Magnetic Resonance Imaging Magy.Chem.Foly. Main Group Chem. Main Group Met.Chem. Makromol.Chem. Malays.J.Sci. Malaysian J.Chem. Mat.Res.Soc.Symp.Proc. Mater.Chem.Phys. Mater.Lett. Mater.Res.Bull. Mater.Sci. Mater.Sci.Eng.,B Mater.Sci.Eng.,C Mater.Sci.Forum Materials and Design Med.Chem.Res. Mendeleev Commun. Met.-Based Drugs Metalloorg.Khim. Methods in Enzymology Microporous and Mesoporous Materials Mikrochim.Acta Mol.Cell.Biochem. Mol.Cryst.Liq.Cryst. Mol.Cryst.Liq.Cryst.Sci.Technol.,Sect.A Mol.Diversity Mol.Pharmaceutics Mol.Pharmacol. Mol.Phys. Mol.Phys.Rep. Molecules Molecules Online Monatsh.Chem. Monogr.Ser.Int.Conf.Coord.Chem. Nadcisnienie Tetnicze Nanjing Ligong Daxue Xuebao Nankai Daxue Xuebao,Ziran Kexueban Nano Lett. Nanotechnology NASA Technical Reports Nat.Mater Nat.Prod.Commun. Nat.Prod.Lett. Nat.Prod.Res. Natural Medicines Nature (London) Naturwissenschaften Neftekhimiya New J.Chem. Nippon Gazo Gakkaishi Nippon Kagaku Kaishi Nitric Oxide Nonlinear Optics Nord.Pulp Pap.Res.J. Nucl.Med.Biol. Nucleic Acids Res. Nucleos.Nucleot. Nucleosides,Nucleotides Nucleic Acids Opt.Mater. Optika Spektrosk. Org.Biomol.Chem. Org.Prep.Proced.Int. Org.Process Res.Dev. Organic Letters Organometallics Organosilicon Chem.:From Mol.to Mater. Orient.J.Chem. Pept.Res. Peptide Science Pflanzenschutz-Nachr.Bayer (Engl.Ed.) Pharm.Acta Helv. Pharm.Ind. Pharm.Pharmacol.Lett. Pharmaceutical Res. Pharmazie Phase Transitions Philos.Trans.R.Soc.London,Ser.A Phosphorus and Sulfur Phosphorus, Sulfur and Silica Phosphorus,Sulfur,Silicon,Relat.Elem. Photochem.Photobiol. Photochem.Photobiol.Sci. Phys.Chem.Chem.Phys.(PCCP) Phys.Chem.News Phys.Rev.B Phys.Rev.B,Condens.Mat. Phys.Rev.Lett. Phys.Status Solidi Physica B: Condensed Matter(Amsterdam) Physica E Phytochem.Anal. Phytochemistry Plant Growth Regulation Planta Med. Pol.J.Chem. Polycyclic Aromat.Compd. Polyhedron Polym.Adv.Technol. Polym.Mater.Sci.Eng. Polymer Polymer Prep. Private Communication Proc.2nd Struct.Chem.Indaba,Intermol.Interact. Proc.Electrochem.Soc. Proc.Ind.Acad.Sci.,A Proc.Indian Acad.Sci.,Chem.Sci. Proc.Nat.Acad.Sci.USA Proc.R.Soc.London,Ser.A Progr.Colloid Polym.Sci. Propellants Explos. Propellants,Explos.,Pyrotech. Protein Pept.Lett. Prot
eins. Struct.,Funct.,Genet. Pure Appl.Chem. Qingdao Daxue Xuebao,Gongcheng Jishuban Qingdao Keji Daxue Xuebao Ziran Kexueban QSAR Comb.Sci. Radiat.Phys.Chem. Radiochim. Acta Radiokhimiya Rare Met. (Beijing China) React.Funct.Polym. Recent Res.Devel.Inorganic Chem. Rengong Jingti Xuebao Rengong Jingti Xuebao Report Kawamura Institute of Chem.Res. Res.Chem.Intermed. Rev.Chim.(Bucharest Rom.) Rev.Cubana Quim. Rev.Inorg.Chem. Rev.Latinoam Quim. Rev.Roum.Chim. Rev.Sci.Instrum. Rev.Soc.Quim.Mexico Rev.Soc.Quim.Peru Rhodium Express Romanian Int.Conf.Chem.Chem.Eng Ross.Khim.Zhurnal Rossiyskie Nanotekhnologii S.Afr.J.Chem. S.Afr.J.Sci. S.T.P.Pharma Sci. Sci.China,Ser.B:Chem. Sci.Pharm. Sci.Rep.Res.Inst.Tohoku Univ.Series A Sci.Rep.Tohoku Univ.First Series (Chem.) Sci.Technol.Adv.Mater. Science Sens.Actuators,B Sep.Purif.Technol. Shaanxi Shifan Daxue Xuebao (Ziran Kexueban) Shandong Dax.Xuebao,Zir.Kex. Shanxi Daxue Xuebao,Ziran Kex. Shiz.Kag.Kenk.Kenkyo Hokoku,Okayama Rika Daig. Sichuan Dax.Xuebao,Zir.Kex. Silicon Chem. Small Solid State Commun. Solid State Ionics Solid State Nucl.Magn.Reson. Solid State Phenom. Solid State Sciences Spectrochim.Acta Spectrochim.Acta,Part A Spectrosc.Lett. Spectroscopy Steroids Struct.Chem. Structure (Cambridge,MA,U.S.) Stud.Surf.Sci.Catal. Sulfur Lett. Supramol.Chem. Suranaree J.Sci.Technol. Synlett Synth.Commun. Synth.Met. Synth.React.Inorg.,Met.-Org.,Nano-Met.Chem. Synth.React.Inorg.Met.-Org.Chem. Synthesis Talanta Teor.Eksp.Khim. Tetrahedron Tetrahedron Lett. Tetrahedron:Asymm. The Analyst Thermochim.Acta Thesis Thin Solid Films Tianjin Shif.Dax.Xue.Zir.Kex. Tianran Chanwu Yanjiu Yu Kaifa To Be Published Topics in Catalysis Trans.Am.Crystallogr.Assoc. Trans.Nonferrous Met.Soc.China Transition Met.Chem. Trends in Organomet.Chem.Res. Trends Optics.Photon. Turk.J.Chem. U.S.Patents Ukr.Fiz.Zh. Ukr.Khim.Zh. Ultrason.Sonochem. Urol.Res. Vestn.Mosk.Univ.,Ser.Khim. Vestn.S.-Peterb.Univ.,Ser.4:Fiz.,Khim. Vestnik Rossiy.Gosudarst.Meditsin.Univ. Vibrational Spectroscopy Visn.Pharm. Vopr.Khim.Khim.Tekh. Vysokomol.Soyed.,A Warasan Wichai Mahawitt.Thaksin Wuhan Dax.Xuebao,Zir.Kex. Wuhan Univ.J.Nat.Sci. Wuji Huaxue Xuebao Wuli Huaxue Xuebao Xiangtan Dax.Zir.Kex.Xueb. Xiangtan Shifan Xueyuan Xuebao (Ziran Kexue Ban) Xibei Daxue Xuebao,Ziran Kexueban Xinyang Shifan Xuey.Xueb.Zir.Kex. Xuzhou Shifan Daxue Xuebao,Ziran Kexueban Yingyong Huaxue Youji Huaxue Z.Anorg.Allg.Chem. Z.Kristallogr. Z.Kristallogr.,Kristallgeom.,Kristallphys.,Kristallchem. Z.Kristallogr.-New Cryst.Struct. Z.Lebensm.-Unters.Forsch.A Z.Naturforsch.,A:Phys.Sci. Z.Naturforsch.,B:Chem.Sci. Z.Naturforsch.,C:J.Biosci. Z.Phys.Chem.(Munich) Zh.Eksp.Teor.Fiz. Zh.Fiz.Khim. Zh.Neorg.Khim. Zh.Obshch.Khim. Zh.Org.Farm.Khim. Zh.Org.Khim. Zh.Prikl.Khim. Zh.Prikl.Spektrosk. Zh.Strukt.Khim. Zh.Vses.Khim.O-Va.D.I.Mend. Zhengzhou Daxue Xuebao,Lixueban Zhongguo Kexue Jishu Daxue Xuebao Zhongguo Kexue,B Ji:Huaxue Zhongguo Xitu Xuebao
Year:  
First page:  
Volume: (omit if journal has no volume)
Author surname: (First or principal surname, e.g. Cox, Smith)

PMR: I have to fill in a form for each paper. This is rather tedious – the data are transmitted by email – , but let’s continue with at least one. I send it off and give it the email I want the CIF to be sent to.  A minute later I get the email which looks like:

Thank you for using the Cambridge Crystallographic Data Centre
CIF Depository request form.
Your request returned 1 structure.
Tetrahedron (2007), 63, 62
Deposition Number(s) 621387
CIF file for 1 structure is attached to this message.
========================================================================
CCDC No     Acell    Bcell    Ccell  Space Gp.
621387   12.6378  33.1849  12.6464     Pna21
========================================================================
CCDC Depository
http://www.ccdc.cam.ac.uk/
LEGAL NOTICE
Unless expressly stated otherwise, information contained in this
message is confidential. If this message is not intended for you,
please inform postmaster@ccdc.cam.ac.uk and delete the message.
The Cambridge Crystallographic Data Centre is a company Limited
by Guarantee and a Registered Charity.
Registered in England No. 2155347 Registered Charity No. 800579
Registered office 12 Union Road, Cambridge CB2 1EZ.

##############################

#########################################
#
#                 Cambridge Crystallographic Data Centre
#                                CCDC
#
#######################################################################
#
#  This CIF contains data from an original supplementary publication
#  deposited with the CCDC, and may include chemical, crystal,
#  experimental, refinement, atomic coordinates,
#  anisotropic displacement parameters and molecular geometry data,
#  as required by the journal to which it was submitted.
#
#  This CIF is provided on the understanding that it is used for bona
#  fide research purposes only. It may contain copyright material
#  of the CCDC or of third parties, and may not be copied or further
#  disseminated in any form, whether machine-readable or not,
#  except for the purpose of generating routine backup copies
#  on your local computer system.
#
#  For further information on the CCDC, data deposition and
#  data retrieval see:
#                         www.ccdc.cam.ac.uk
#
#  Bona fide researchers may freely download Mercury and enCIFer
#  from this site to visualise CIF-encoded structures and
#  to carry out CIF format checking respectively.
#
#######################################################################
data_4.CH~3~CN.CHCl~3~
_database_code_depnum_ccdc_archive ‘CCDC 621387’
_audit_creation_method           SHELXL


wbr /\>_count 0
      -12
3(10) Uani 1 d . .
87(13) 0.7801(4) 0.0317(10) Uani 1 d . .
C30 0.023(2) 0.023(2) 0.022(2) 0.002(2) -0.002(2) 0.006(2)
0.024(2) 0.030(3) -0.002(2) -0.006(2) 0.006(2)
003cbr /\>C34 C36 C37 107.4(4) . . ?
3cbr /\>C44 C45 C46 117.5(4) . . ?
PMR: and I now have one extra file for Crystaleye. But am I allowed to post it on our server. We’ll write to the CCDC and find out. But this post is quite long enough for today…
[1] One of Robert Maxwell’s first journals. When it came out it was rather exciting. A specialist journal for carbon compounds (organic chemistry). And because carbon often has a tetrahedral environment, this was a very trendy name for the 1970’s. I published in it.

Posted in data, open issues | 1 Comment