petermr's blog

Why the NIH bill does not require copyright violation

Posted on December 28, 2007 by pm286

[Note, by the way, that the Blue Obelisk deliberately did not include Open Access in its scope – we are not a universal free love and flowers cult but one that addresses why chemistry needs an overhaul in how its data and knowledge are communicated now and for posterity. We felt that Open Access was orthogonal to ODOSOS. All of us at times publish in closed access journals. Moreover it does not require monk-like adherence to all its principles all the time – but that’s another story.] I quote in full since the premises are important…

Rich Apodaca – A New Beginning or More of the Same?

21:02 27/12/2007, Rich Apodaca,

As discussed by Peter Suber, Peter Murray-Rust and others, President Bush signed H.R. 2764 into law yesterday. Among the many items in this bill is one that proponents argue could change the nature of the Open Access debate. Does this new law represent a fundamentally changed game, or just the next inning of the old one?

The text of the new law spells out what is now required:

SEC. 218. The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine’s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.

IANAL, but the provision requiring the policy to be implemented “in a manner consistent with copyright law” offers publishers (and scientists) all the flexibility they need to continue business as usual.
The reason is simple. Transfer of copyright from the author of a scientific paper to the publisher is usually one of the first things to happen “upon acceptance” of a manuscript for publication. And the new law makes it perfectly clear that copyright law takes precedence over deposition into PubMed Central.
Most of the journals in question will be hostile to the idea of having their copyrighted material deposited into PubMed Central and so understandably won’t allow it to be done by the authors of papers or anyone else.
Take this hypothetical scenario for example: Professor Gross at California University gets his manuscript approved for publication in the Journal of Nanoscale Devices (JND). Professor Gross is fully aware both of HR 2764 and JND’s refusal to deposit manuscripts into PubMed Central – the reasons why Professor Gross would choose JND anyway are interesting, but not relevant here. Along with the acceptance letter, JND requests prompt return of a signed copyright transfer agreement. Professor Gross sends in the signed form and from that point on, all rights to his article belong to JND. As is their policy, JND refuses Professor Gross permission to deposit a copy of his paper into PubMed Central within 12 months after publication.
Unless I’m missing something, neither Professor Gross nor JND have violated any laws. The assumption made by proponents of the new law seems to be that to implement the new policy, the Director of NIH will forbid publication by grant recipients in journals that don’t allow deposition of articles into PubMed Central.
How many influential scientist do you know of who would tolerate the government telling them which journals they can and can’t publish in? The minute such a misguided policy is put in place, the national scientific outcry would more than overwhelm anything Open Access proponents could muster.

PMR: There are several of the common counterarguments here and I shan’t address all of them. As an axiom let me state that some of them are peculiar to the US and make little sense outside.

The primary confusion is that here the NIH is acting as a grant-giving organisation, not an instrument of government in general. There is no universal US law here, but a contractual agreement between a provider of funds and the recipient. The funder says IF you receive a grant from us THEN you must do X. There is no law requiring anyone in the US or elsewhere to apply for funding to NIH. There are many other funders who support medicine and health including Wellcome, HHMI, Cancer Research UK, etc. Each has its conditions. No one has to apply to any of them.
Almost all funders limit the scope of their funding and impose conditions on recipients. For example a Cancer funder will normally require that the work is related to cancer, a children’s charity to children, etc. There would be cases where national laws might override this (it is likely that funding which is clearly racist would be challenged but it is possible to have a gender specific funder).
All research is likely to be a compromise between:

what the researcher would like to do
what the funder would like to be done
what is feasible and valuable

For example a funder might require that no research involved living animals and some will go further and forbid the use of any animal tissue. The applicant has a choice as to whether they wish to work with constraints or look elsewhere. In some cases (and I hope readers can add them) national funding agencies take strong lines on the permitted use of biotechnology in the work – and this differs from country to country.
In the current case that funder has a contractual requirement that the work be published openly after 12 months. I imagine that this requirement will occur in something like the US CFR

The Code of Federal Regulations (CFR) is the codification of the general and permanent rules published in the Federal Register by the executive departments and agencies of the Federal Government. It is divided into 50 titles that represent broad areas subject to Federal regulation. Each volume of the CFR is updated once each calendar year and is issued on a quarterly basis. More.

These are regulations on how government is carried out. An application for a new drug has to conform to 21CFR11 (and probably many more) . No one is required to develop new drugs but if they do they have to conform. So I hypothesize that in the current case the regulation (which has the force of law) requires the NIH to require grantees to publish their work openly in a specified time frame.
Nothing is said about the manner of publication. The author might, for example, start their own journal specifically for this purpose. They might set up an Open Notebook wiki. (I skip problems of patient confidentiality, etc.). The only requirement would be to satisfy the funders that they had met the regulations. I would not be surprised if the words did not actually specify peer-review (can anyone comment?). If the grant consists of staged contributions then the grantee would have to satisfy the program manager that the work had been published as rapidly as is consistent with good science. I would be amazed if the regulations specified a limited set of journals that were the only ones that could be used, and even more if these were defined by a citation metric algorithm (“you can only publish in journals with IF > 10.0”). There is real scope here for novel types of publication.

Rich: Neither HR 2764 nor any form of government intervention will bring widespread Open Access into being. The only things that will change the status quo are: (1) the availability of tools for making it happen; and (2) the realization by individual investigators that continuing to give away their hard-earned copyright makes them far less competitive than their peers who don’t.

PMR: HR 2764 will have a major impact. Partly because there are many scientists who will be directly affected by it, but partly because it is symbolic. Other funders (e.g. European or national governments) will now be compared against the NIH. I can write to the UK EPSRC and ask them why they don’t do the same. (Of the 7 research councils in the UK, the EPSRC is almost alone in not requiring some form of Open publication). I know the current answer, but who knows – they may have already started to change. Europe has been debating whether European research must be made open.
An analogy with Open Source may be useful. Several funders require that all software created in a program should be released as Open Source. Many universities require that academics maximise the income they generate from their research. These two are often in conflict. My own approach is to release most software as Open Source. However in some cases I have taken industrial funding and the output of that is usually different. If I felt that this would be against fundamental principles I would turn the funding down. Simple.

Open Access proponents should forget about getting the Federal Government to fix the mess that modern scientific publication has become. Instead, they should focus on making Open Access-like options more attractive to scientists.

PMR: This is a purely US argument which is almost incomprehensible on this side of the Atlantic and probably almost everywhere else. No one likes paying taxes, but we accept that government tries to spend them wisely. It [the argument] is epitomized in Rudy Baum’s “Socialized Science” and More Socialized Science articles.
The word socialize means:

1. socialise – take part in social activities; interact with others;
2. socialise – train for a social environment; “The children must be properly socialized”
3. socialise – prepare for social life; “Children have to be socialized in school”
4. socialise – make conform to socialist ideas and philosophies; “Health care should be socialized!”

Meaning 4 (presumably Rudy’s usage) is – I think – entirely unknown outside the US. When I used the apparent synonym “socialist” Rudy corrected me. I therefore have no idea what the word means other than that it seems to be pejorative. There is clearly a strong US-only political undercurrent which we outsiders should not try to swim in.
To finish: Open Access enthusiasts are working very hard to create attractive options. A major part of this (“the tools”) are new publishers and organs.It takes ca. 5 years for a new conventional journal to achieve serious impact factors and a number of these have and are being launched. I expect that, like OUP and BMC Bioinformatics, we shall see many of the new ones prosper.
What I really fear is the growth of “hybrid horrors”. This is where the publishers create something which isn’t really Open but is covered by such a mass of verbiage that it is almost impossible to work through. I’ve spent weeks earlier this year trying to uncover publisher policies and in some cases failing. When I do find out what is happening it is heavily publisher-specific and often not even implemented as they say it is. So I expect to see a continued stream of “slightly-Open” offerings trumpeted as NIH-compliant. This requires heavy work to investigate and police – work which is entirely unproductive and usually unfunded.
The great advantage of the requirement to deposit in Pubmed (rather than simply to expose on a publisher or other website) is that the act is clear. You can’t “half-deposit” in Pubmed. They have the resources to decide whether any copyright statement allows the appropriate use of the information or is suffiently restrrictive that it does not meet the NIH rules.
At some stage the community will get tired of the continual drain on innovation set by the current approach to publihing. Whether when that happens many publishers will be left is unclear.

Posted in open issues | Tagged copyright, nih | 2 Comments

Thank you President Bush

Posted on December 26, 2007 by pm286

From Peter Suber:

OA mandate at NIH now law

This morning President Bush signed the omnibus spending bill requiring the US National Institutes of Health (NIH) to mandate OA for NIH-funded research.
Here’s the language that just became law:

The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine’s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.

PMR: We Can now celebrate.
The hard work continues. But now all fulltext derived from NIH work will be available on PubMed. Other funders will follow suit (if they are not ahead). So our journal-eating-robot OSCAR will have huge amounts of text to mine.
The good news is that we believe that this text-mining will, in itself, uncover new science. How much we don’t know, but we hope it’s significant. And if so, that will be a further argument for freeing the fulltext of every science publication.

Posted in open issues | Tagged nih bill, open access | 7 Comments

Update on Open crystallography

Posted on December 22, 2007 by pm286

There’s now a growing movement to publishing crystallography directly into the Open. Several threads include:

The Crystallography Open Database which pioneered the idea of collecting crystallographic data and making them Openly available.
Nick Day’s CrystalEye – aggregation of published Open structures (from journals which don’t appropriate facts)
the eCrystals collection at Southampton, initially the repository for the National Crystallographic Service and now a JISC-sponsored project to federate crystallographic repositories.
Other collaborative groups including Reciprocal Net and STaRBURSTT
the Microsoft eChemistry Project and molecular repositories (see blog)
we are getting increasing queries about our SPECTRa project.

… so it was no great surprise when Jean Claude blogged:

X-Ray Crystallography Collaborator

20:41 20/12/2007, Jean-Claude Bradley, Useful Chemistry

We have another collaborator who is comfortable with working openly: Matthias Zeller from Youngstown State University.
With the fastest turnaround for any crystal structure analysis I’ve ever submitted, we now have the structure for the Ugi product UC-150D. For a nice picture of the crystals see here.

PMR: J-C also mailed us and asked how w/he could archive and disseminate the crystallography. So here’s a rough overview.
Crystallography is a microcosm of chemistry and we encounter many different challenges:

not all structures are Open (some not initially, some never). Managing the differential access is harder than it looks. It has to be owned by the Department or Institution. So you probably need access control, and probably an embargo system.
Institutional repositories are not generally oriented towards data. Some may, indeed, only accept “fulltext”. So there may be nowhere obvious to go.
The raw data (CIF) contains metadata, but not in a form where search engines can find it. That’s a important part of what SPECTRa does – extracts metadata and repurposes it.
The CIF can, but almost universally does not, contain chemical metadata. So part of JUMBO is devoted to trying to extract chemistry out of atomic positions. Needs a fair amount of heuristic code.

So in conjunction with eChemistry and eCrystals and in the momentum of SPECTRa we are continuing to develop software for crystallographic repositories. There are several reasons why people want such repositories:

as a high-quality lab companion – somewhere to put your data and get it back later.
as somewhere to provide knowledge for data-driven science (e.g. CrystalEye)
as somewhere to save your data for publication and dissemination
as somewhere to archive your data for posterity (e.g. an IR)

These put different stresses on the software, so Jim and I are developing context-independent tools that can be used in any. I’m hacking the JUMBO software (CrystalTool) and he is hacking CrystalEye so it becomes a true repository.
This is our relaxation over the holiday.
???

Posted in data, open issues, programming for scientists | Tagged crystaleye, crystallography, repositories | Leave a comment

FoX marches on

Posted on December 22, 2007 by pm286

Toby White joined us – Jim Downing, Peter Corbett and me – in the pub yesterday to unwind and explore the challenges of tomorrow’s information. Toby has been one of the pillars of supporting CML – there was no requirement to do so but he and colleagues (mainly in Earth Sciences) saw the value and used it anyway. The added challenge is FORTRAN. FORTRAN is a great language – my first encounter was ca 1970. It’s oriented towards rectangular data – of variable dimensionality. It is extremely good at scientific computing with large number of numbers and it understands – as much as most – how real numbers work.
But it’s not easy to interface with XML unless your data model is also rectangular. Historically molecular data was – atoms vertically, coordinates and other properties across. Bit of a problem if data are missing – hacks include magic numbers (e.g. 1.0e-bignumber, or zero-and-hope, or a row of stars (great fun when reading back in)).
So Toby has written FoX – a real labour of love. If you develop ANY FORTRAN code, please use FoX for the data i/o. It’s easy and it saves huge amount of messy glueware. There’s now no technical reason why all comp.chem software shouldn’t emit XML/CML. It’s not just “another file format” – it’s a new way of thinking about information.
Anyway

From: Toby White
To: FoX@lists.uszla.me.uk
Subject: [FoX] Release of version 3.1
This is to announce the release of version 3.1 of the FoX library.
(download from <http://source.uszla.me.uk/FoX/FoX-3.1.tgz>)
This new release features
* extended portability across compilers
(see <http://uszla.me.uk/space/software/FoX/compat/>)
* a “dummy library” capability
(see <http://www.uszla.me.uk/FoX/DoX/Compilation.html#dummy_library>)
* extended DOM functionality, including several more Level 3 functions,
and additional Fortran utility wrappers
(see <http://www.uszla.me.uk/FoX/DoX/FoX_dom.html#dataExtraction>)
Merry Christmas,

Toby

PMR: Enjoy!

Posted in programming for scientists | Tagged FORTRAN, FoX | Leave a comment

Mystery picture

Posted on December 21, 2007 by pm286

What’s this picture?

and why might I be interested in it?
(It’s not the whole picture, so I claim fair use – I don’t know who the copyright holder is. And the clipped space hides a fairly vital clue).
[UPDATE: 2007-12-23:
It’s a penguin, drawn by Robert Shackelton. There’s also one by Robert Scott. They were discovered in a basement in the Scott Polar Research Institute which is just next to The Chemistry lab in Cambridge. There was a TV van there two days ago…
http://ap.google.com/article/ALeqM5iKl5uJqCfIDn9RKK1LZK2JmKTxhwD8TM770G0
and
http://news.bbc.co.uk/1/hi/sci/tech/7154205.stm
and
http://www.telegraph.co.uk/news/main.jhtml?xml=/news/2007/12/21/npenguin121.xml
P.

Posted in fun | Tagged mystery, penguin | 3 Comments

the end of the beginning

Posted on December 21, 2007 by pm286

I got a series of euphoric messages from fellow OA activists rejoicing at the news that Preseident Bush was “certain” to sign the House appropriations bill. I searched for the message in Peter Suber’s blog and found …

Congress sends revised spending bill, and OA mandate for NIH, to President

This evening the House of Representatives passed an omnibus spending bill containing language requiring the NIH to adopt an OA mandate. The Senate passed the bill on Tuesday.
Because it cuts spending to the levels President Bush requested, and gives him $70 billion for the war in Iraq and Afghanistan, he is expected to sign it.
The OA mandate for the NIH isn’t law yet, but it’s very, very close. Watch this space.

PMR: I am watching this space … and Alma Swan writes:

>The Appropriations Bill, with the language in about the NIH mandate, passed
>in the US Senate last night. It now *will* be signed off by President Bush.
>
>Heather deserves huge congratulations. This has been virtually a
>one-woman-led effort, and she has fought the publishers all the way, in
>every corridor and in every committee room. Now to try to emulate her in
>Brussels …

PMR: Absolutely total congratulations to Heather. I don’t know enough about whether presidential signatures are deterministic so will wait a few more days before breaking open any bottles.
And we should remember that the struggle continues.
“Now this is not the end. It is not even the beginning of the end. but it is, perhaps, the end of the beginning.”
and why should I choose this quotation?

Posted in open issues | Tagged nih bill, open access | 2 Comments

Java: labelled break considered harmful

Posted on December 20, 2007 by pm286

Readers of my last post may have thought that Eclipse makes refactoring easy. It does – up to a point. I had started to refactor an 800-line module with deeply nested loops – just a matter of extracting the inner loops as methods…
… NO!
When I tried this I got:
“Selection contains branch statement but corresponding branch target is not selected”
???
On closer examination I discovered that the code contained a construct like:
foo: plunge(); for (int i = 0; i < 1; i++) { boggle(); if (bar) { break foo; } }
[Added later: PUBLIC GROVEL. Jim has pointed out that I have misunderstood the break syntax, so the code above is WRONG. At least this shows that I never use labelled break. It should read:
plunge(); foo: for (int i = 0; i < 1; i++) { boggle(); if (bar) { break foo; } }
Strikethoughs indicate my earlier misconceptions.
What’s happening here? The code contains a labelled break. If the break foo is encountered, then the control jumps ~~to the label foo. This can be almost anywhere in the module – and in this case it was often before the start of the loop. to~~
out of the labelled loop.
Jumping to arbitrary parts of a module is considered harmful (Go To Statement Considered Harmful). Sun/Java announces:

2.2.6 No More Goto Statements
Java has no goto statement1. Studies illustrated that goto is (mis)used more often than not simply “because it’s there”. Eliminating goto led to a simplification of the language–there are no rules about the effects of a goto into the middle of a for statement, for example. Studies on approximately 100,000 lines of C code determined that roughly 90 percent of the goto statements were used purely to obtain the effect of breaking out of nested loops. As mentioned above, multi-level break and continue remove most of the need for goto statements.

~~but surely the code below is a direct replacement for goto.~~
~~while (true) {~~
break foo;
}
~~continue is useful. break out of single level (unlabelled) is useful. break out of multiple loops might just be OK if it was always downwards and always to the point immediately after a loop.~~
~~But it isn’t.~~
~~so – and I am surprised that I can’t easily find it on Google:~~
~~“labelled break considered harmful”~~
However as it is still extremely easy to write code that cannot be easily refactored I still hold that labelled breaks should be used only when essential.

Posted in programming for scientists | 1 Comment

Refactoring large modules using Eclipse

Posted on December 20, 2007 by pm286

I have recently had to consider refactoring a piece of Java which had got slightly out of hand – the module was 800 lines long and the if statements so deeply nested that they ran well off the right-hand edge of the page. I will NOT identify where it came from or to criticize – I have written much worse in my past (you can do really fun things with computed GOTOs in FORTRAN.). But it was and is unmaintainable and we care about that in the Centre.
So I thought that I would sit down with Eclipse in front of the football and refactor it. Eclipse has this really neat Refactor that allows you to select a chunk of code and turn it into a method. For example:
public void add3DStereo() { // StereochemistryTool stereochemistryTool = new // StereochemistryTool(molecule); ConnectionTableTool ct = new ConnectionTableTool(molecule); List cyclicBonds = ct.getCyclicBonds(); List doubleBonds = molecule.getDoubleBonds(); for (CMLBond bond : doubleBonds) { if (!cyclicBonds.contains(bond)) { CMLBondStereo bondStereo3 = create3DBondStereo(bond); if (bondStereo3 != null) { bond.addBondStereo(bondStereo3); } } } List chiralAtoms = new StereochemistryTool(molecule).getChiralAtoms(); for (CMLAtom chiralAtom : chiralAtoms) { CMLAtomParity atomParity3 = null; atomParity3 = calculateAtomParity(chiralAtom); if (atomParity3 != null) { chiralAtom.addAtomParity(atomParity3); } } }
I now select the first for loop and turn it into a method; and repeat for the second and get:
public void add3DStereo() { // StereochemistryTool stereochemistryTool = new // StereochemistryTool(molecule); ConnectionTableTool ct = new ConnectionTableTool(molecule); List cyclicBonds = ct.getCyclicBonds(); List doubleBonds = molecule.getDoubleBonds(); addBondStereo(cyclicBonds, doubleBonds); List chiralAtoms = new StereochemistryTool(molecule).getChiralAtoms(); addAtomParity(chiralAtoms); }
/**
* @param chiralAtoms
*/
private void addAtomParity(List chiralAtoms) {
for (CMLAtom chiralAtom : chiralAtoms) {
CMLAtomParity atomParity3 = null;
atomParity3 = calculateAtomParity(chiralAtom);
if (atomParity3 != null) {
chiralAtom.addAtomParity(atomParity3);
}
}
}
/**
* @param cyclicBonds
* @param doubleBonds
*/
private void addBondStereo(List cyclicBonds, List doubleBonds) {
for (CMLBond bond : doubleBonds) {
if (!cyclicBonds.contains(bond)) {
CMLBondStereo bondStereo3 = create3DBondStereo(bond);
if (bondStereo3 != null) {
bond.addBondStereo(bondStereo3);
}
}
}
}
The whole thing took 30 seconds, including choosing the module names. Eclipse did all the params, documentation return values – everything.
Try it – it will really fix up many sorts of grotty code…

Posted in programming for scientists | Tagged eclipse refactoring | 5 Comments

Mystery Picture

Posted on December 20, 2007 by pm286

Here is a photograph (untouched, not CGI). When I saw it I went wow! (I knew what it was). I’d be interested to know if anyone (a) KNOWS what it is of (b) can estimate the scale (c) has seen anything like it. If you do know, please post a comment saying so [but please DON’T give the answer]. I plan to release more information daily…
Besides the photo itself there is a serious question. How can you search the web for images like this?

and a close-up:

[UPDATE – more info: The photograph was taken yesterday by Dr. Judith Murray-Rust.]
[ANSWER: This is, indeed, crystalline water but the scale took us by surprise. The x-axis is ca. 20 cm. This artefact appeared in our bird bath and there appear to be 2 perfect, huge, hexagonal ice crystals (it is possible that they are both sixfold twins, I suppose). The faces are highly planar and specular (we have more pictures).
It is also remarkable that there are two artefacts separated by 10 cm(between centres) which are almost identical. What possible coupling could there be between them – that is the real mysetery.]
Happy Holliday – as I might say to Gemma.

Posted in fun, semanticWeb | 7 Comments

Open Data: publishers are the problem

Posted on December 18, 2007 by pm286

The Chemspider site and blog have been making rapid and valuable progress towards Open Data. This is particularly laudable for a commercial site where Openness in chemistry is a long way from being a proven business model and is actively resisted by many. Here is a typical tale of frustration – I comment below
Why We Can’t Publish Scraped CrystalEye Data Yet….And Science Commons Declare a Protocol for Implementing Open Access Data
Previously I blogged about our intention to scrape CrystalEye data and publish onto ChemSpider. The original comments regarding the data on CrystalEye were as follows:

pm286 Says:
October 26th, 2007 at 7:54 am (1) All data come from Free sources – i.e. visible without a subscription. Some journals (Acta Crystallographica and RSC for example) do not copyright the data. Others like ACS add copyright notices. It is our contention, and Elsevier has agreed for its own material, that facts are not copyrightable. We have therefore extracted and transformed facts and mounted these. Where the original material (CIF) does not carry copyright we mount it on our pages – where it does we do not, but we have the transformed data. In those cases it would be possible to recreate the original CIF data in semantic form ,but not the exact typographical layout which contains meaningless whitespace.I am not aware that ACS or Elsevier have ever made statements of any kind about our Open Data efforts.You may scrape anything, must you must honour the source and the metadata and you should add the Open Data sticker. If you scrape the link (simplest) you may simpy point to our site. If you scrape more data you should ensure that the integrity of the data is maintined and that if it is re-used the re-used data should still clearly show our metadata.

[PMR: Yesterday’s announcement of the CCZero licence could mean that we change from a meta-licence (“Open Data”) to an explicit CCZero licence. I will need to read the details. I don’t think it changes the arguments below.]

We have already done the work to scrape certain data from the site but have chosen to be extra careful with taking the declaration of Open Data made to all data sources. My primary worry was with the data scraped from the ACS journals. With this caution in mind I sent a letter to the copyright department at ACS as outlined here. In fact I made a couple of phone calls, sent the email about 2 more times and finally managed to talk to a nice gentleman from the ACS copyright department and brought my concerns to light. Since then we have exchanged multiple emails, spoken again on the phone and I have been told that a meeting of minds from both Washington and Ohio was being scheduled to discuss the situation. That’s 2 months after my original email.
Today I received the following email and I am excerpting from it..
“Thank you for your inquiry about the proposed use by ChemSpider of information in the CrystalEye database that has been published within certain ACS journal publications. In light of your query, we are examining the manner in which ACS published material is represented within that database as well as the nature of your proposed use, so that we can respond in an informed manner to your request.
<snip>
If you will be attending the ACS National Meeting in New Orleans, perhaps we could confer with you at that time to discuss our findings and advise you appropriately?
Communicators Name withheld ”
What I thought was a simple question and done with the intention that ChemSpider was safe turns out not to be so simple. It could take until March 2008 to get an answer! At this stage we will not be publishing any of the CrystalEye data without confirmation from each of the publishers that this is allowed. I asked the question previously “Who gets to declare data open or not?“ and even received the question “Why even offer the option of closed?” The primary reason is that we have turbulent times ahead of us around such issues of “openness” and until these are navigated I am working to keep ChemSpider “safe “. I am willing to participate, support and contribute to the evangelism of openness but am equally concerned with keeping ChemSpider alive for the close to 3000 users per day now accessing the service.
It was an interesting day to receive this email about a potential FIVE MONTH delay to a decision about Open Data especially now that Science Commons have released a Protocol for Implementing Open Access Data just yesterday. …
So, while protocols are exposed to the community by Science Commons the challenge of utilizing them now begins…I will be in communication with members of the Science Commons soon to determine how ChemSpider can it into the model…

PMR: This is, unfortunately, completely typical. Earlier this year I wrote to Tetrahedron (an Elsevier journal) asking if they would consider posting CIFs (crystallographic data):

Request for Open publication of crystallographic data in Elsevier’s Tetrahedron

=========== Open letter to editors of Tetrahedron ==========
Professor L. Ghosez ,
Professor Lin Guo-Qiang ,
Professor T. Lectka ,
Professor S.F. Martin ,
Professor W.B. Motherwell ,
Professor R.J.K. Taylor ,
Professor K. Tomioka
Subj: Request for Open publication of crystallographic data in Tetrahedron
Dear editors,
I have recently been reviewing access to supplemental data in chemistry publications, in particular crystallographic data (”CIFs”). Many publishers (IUCr, RSC, ACS…) expose these on their websites as Open Data (for examples see: http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=455). The data are acknowledged not to be copyrightable (see http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=447) where your colleague Jennifer Jones (copied) has confirmed:

Dear Peter Murray-Rust

Thanks for your email. Data is not copyrighted. If you are reusing the entire presentation of the data, then you have to seek permission, otherwise, you can use the data without seeking our permission.

Yours sincerely

Jennifer Jones

Rights Assistant

Global Rights Department

Elsevier Ltd

PO Box 800

Oxford OX5 1GB

UK

Tel: + 44 (1) 865 843830

Fax: +44 (1) 865 853333

email: j.jones@elsevier.com

Other Elsevier journals such as those publishing thermochemistry (see last blog post) are now actively making the supplemental data Openly available on the journal website. I am therefore asking whether Tetrahedron (and perhaps other Elsevier chemistry journals) might consider publishing their data Openly in this way and would be grateful for your views.
(This is an Open letter (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=456) and I would like to publish your reply so please mark any confidential material as such).
Thank you for considering this

PMR: Five editors – I haven’t had the courtesy of a reply. This is not uncommon – I didn’t get replies on Open topics from Wiley, Springer (first time round) either. Either journals are not in the habit of replying – they consider ordinary scientists too low in the foodchain to merit consideration (most likely) – or they regard anything Open as a pain and want to slow it by inaction (also most likely). They have their set way of doing things – God ordained in 1972 that the world belongs to the publishers and they don’t want to see it change.
Another typical example. I was invited to write an article for Serials Review on Open Data. I asked if I could write my artcile in HTML and embed my own copyright material, noted as such under appropriate licence. The editorial office siad that would come back to me. It’s now past the closing date of the submission. After ca. 6 weeks I got the reply:

Facts and data are not copyrightable but the expression of data is
copyrightable. If you wish to use third-party data in a different
format within your article, including full acknowledgement to the source
of the data, then that would be acceptable. However, if you wish to
retain the expression of the data, then you will need to include
alternate diagrams within the article.

So I can use the data – IF I can get it. If I can only get a graph then I can’t unless I redraw it. Is redrawing a graph a useful activity for science – do I need to answer? The only value is that it adds some random errors to the data (or systematic ones) that would be fun to give as exercises in bad scientific practice for students. “Expression of the data” – i.e. the author’s graphs – are not re-usable.
So what’s the answer? Currently I use the “ask forgiveness, not ask permission” mode. And if the “owners” ot the data (read “appropriators”) send the lawyers and ask for a take-down – make a huge public fuss. As the world did when Shelly Batts “stole” a graph from from Wiley (Sued for 10 Data Points). And Wiley backed down. The publishers don’t like public fuss.
So a few months ago I would have advised Chemspider “go ahead”. But they ran foul of another publisher (I think it was the Royal Society of Chemistry). I never understood the details but Chemspider linked to publicly visible papers (not Open) and were asked to take the links out of the Chemspider database. This doesn’t even seem to make sense. I would have thought publishers would like people linking to their papers – maybe it was the metadata.
So I appreciate Chemspider’s wish to remain on the correct legal side of the publisher. But [the publishers’] actions destroy scientific data in the current century. Chemistry publishers [OA publishers and IUCr excepted] are actively and passively resisting the re-use of data. They copyright factual data, hide it, require take-downs, refuse to reply to reasonable letters – everything. They are simply in the way between the creator of the data and the consumer
As I have blogged we now have an exciting project sponsored by Microsoft on eChemistry. We are going to fill repositories with data. And we are going to get that data (“not copyrightable” – see above) from any source we reasonably can. It will be available to the whole world. It will probably be stamped CCZero. CrystalEye will be in there. We shall, of course, include the source (provenance) as we really care about it and metadata. So people will know where it came from.
Why can’t the ACS reply “Yes” to Chemspider by return? Does it really make sense for chemistry publishers to be universally seen as Luddites? Because the world will sweep these restrictive practices away, and the business will have moved from the publishers to somewhere in the twenty-first century (the one we are in).

Posted in chemistry, open issues | 3 Comments

Why the NIH bill does not require copyright violation

Thank you President Bush

Update on Open crystallography

FoX marches on

Mystery picture

the end of the beginning

Java: labelled break considered harmful

Refactoring large modules using Eclipse

Mystery Picture

Open Data: publishers are the problem

Request for Open publication of crystallographic data in Elsevier’s Tetrahedron

Recent Posts

Recent Comments

Archives

Categories

Meta