Category Archives: open issues

Why the NIH bill does not require copyright violation


Rich Apodaca is a founder member of the BlueObelisk - which advocates ODOSOS - Open Data, Open Source and Open Standards (mainly in chemistry). Rich has made major contributions in this area and adds valuable insights on his Depth-First blog. So I was interested that he feels that the NIH bill is misdirected and won't work because it requires authors to publish as Open Access.


[Note, by the way, that the Blue Obelisk deliberately did not include Open Access in its scope - we are not a universal free love and flowers cult but one that addresses why chemistry needs an overhaul in how its data and knowledge are communicated now and for posterity. We felt that Open Access was orthogonal to ODOSOS. All of us at times publish in closed access journals. Moreover it does not require monk-like adherence to all its principles all the time - but that's another story.] I quote in full since the premises are important...


Rich Apodaca - A New Beginning or More of the Same?

21:02 27/12/2007, Rich Apodaca,

As discussed by Peter Suber, Peter Murray-Rust and others, President Bush signed H.R. 2764 into law yesterday. Among the many items in this bill is one that proponents argue could change the nature of the Open Access debate. Does this new law represent a fundamentally changed game, or just the next inning of the old one?

The text of the new law spells out what is now required:

SEC. 218. The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine's PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.

IANAL, but the provision requiring the policy to be implemented "in a manner consistent with copyright law" offers publishers (and scientists) all the flexibility they need to continue business as usual.

The reason is simple. Transfer of copyright from the author of a scientific paper to the publisher is usually one of the first things to happen "upon acceptance" of a manuscript for publication. And the new law makes it perfectly clear that copyright law takes precedence over deposition into PubMed Central.

Most of the journals in question will be hostile to the idea of having their copyrighted material deposited into PubMed Central and so understandably won't allow it to be done by the authors of papers or anyone else.

Take this hypothetical scenario for example: Professor Gross at California University gets his manuscript approved for publication in the Journal of Nanoscale Devices (JND). Professor Gross is fully aware both of HR 2764 and JND's refusal to deposit manuscripts into PubMed Central - the reasons why Professor Gross would choose JND anyway are interesting, but not relevant here. Along with the acceptance letter, JND requests prompt return of a signed copyright transfer agreement. Professor Gross sends in the signed form and from that point on, all rights to his article belong to JND. As is their policy, JND refuses Professor Gross permission to deposit a copy of his paper into PubMed Central within 12 months after publication.

Unless I'm missing something, neither Professor Gross nor JND have violated any laws. The assumption made by proponents of the new law seems to be that to implement the new policy, the Director of NIH will forbid publication by grant recipients in journals that don't allow deposition of articles into PubMed Central.

How many influential scientist do you know of who would tolerate the government telling them which journals they can and can't publish in? The minute such a misguided policy is put in place, the national scientific outcry would more than overwhelm anything Open Access proponents could muster.

PMR: There are several of the common counterarguments here and I shan't address all of them. As an axiom let me state that some of them are peculiar to the US and make little sense outside.

The primary confusion is that here the NIH is acting as a grant-giving organisation, not an instrument of government in general. There is no universal US law here, but a contractual agreement between a provider of funds and the recipient. The funder says IF you receive a grant from us THEN you must do X. There is no law requiring anyone in the US or elsewhere to apply for funding to NIH. There are many other funders who support medicine and health including Wellcome, HHMI, Cancer Research UK, etc. Each has its conditions. No one has to apply to any of them.

Almost all funders limit the scope of their funding and impose conditions on recipients. For example a Cancer funder will normally require that the work is related to cancer, a children's charity to children, etc. There would be cases where national laws might override this (it is likely that funding which is clearly racist would be challenged but it is possible to have a gender specific funder).

All research is likely to be a compromise between:

  • what the researcher would like to do
  • what the funder would like to be done
  • what is feasible and valuable

For example a funder might require that no research involved living animals and some will go further and forbid the use of any animal tissue. The applicant has a choice as to whether they wish to work with constraints or look elsewhere. In some cases (and I hope readers can add them) national funding agencies take strong lines on the permitted use of biotechnology in the work - and this differs from country to country.

In the current case that funder has a contractual requirement that the work be published openly after 12 months. I imagine that this requirement will occur in something like the US CFR

The Code of Federal Regulations (CFR) is the codification of the general and permanent rules published in the Federal Register by the executive departments and agencies of the Federal Government. It is divided into 50 titles that represent broad areas subject to Federal regulation. Each volume of the CFR is updated once each calendar year and is issued on a quarterly basis. More.

These are regulations on how government is carried out. An application for a new drug has to conform to 21CFR11 (and probably many more) . No one is required to develop new drugs but if they do they have to conform. So I hypothesize that in the current case the regulation (which has the force of law) requires the NIH to require grantees to publish their work openly in a specified time frame.

Nothing is said about the manner of publication. The author might, for example, start their own journal specifically for this purpose. They might set up an Open Notebook wiki. (I skip problems of patient confidentiality, etc.). The only requirement would be to satisfy the funders that they had met the regulations. I would not be surprised if the words did not actually specify peer-review (can anyone comment?). If the grant consists of staged contributions then the grantee would have to satisfy the program manager that the work had been published as rapidly as is consistent with good science. I would be amazed if the regulations specified a limited set of journals that were the only ones that could be used, and even more if these were defined by a citation metric algorithm ("you can only publish in journals with IF > 10.0"). There is real scope here for novel types of publication.

Rich: Neither HR 2764 nor any form of government intervention will bring widespread Open Access into being. The only things that will change the status quo are: (1) the availability of tools for making it happen; and (2) the realization by individual investigators that continuing to give away their hard-earned copyright makes them far less competitive than their peers who don't.

PMR: HR 2764 will have a major impact. Partly because there are many scientists who will be directly affected by it, but partly because it is symbolic. Other funders (e.g. European or national governments) will now be compared against the NIH. I can write to the UK EPSRC and ask them why they don't do the same. (Of the 7 research councils in the UK, the EPSRC is almost alone in not requiring some form of Open publication). I know the current answer, but who knows - they may have already started to change. Europe has been debating whether European research must be made open.

An analogy with Open Source may be useful. Several funders require that all software created in a program should be released as Open Source. Many universities require that academics maximise the income they generate from their research. These two are often in conflict. My own approach is to release most software as Open Source. However in some cases I have taken industrial funding and the output of that is usually different. If I felt that this would be against fundamental principles I would turn the funding down. Simple.

Open Access proponents should forget about getting the Federal Government to fix the mess that modern scientific publication has become. Instead, they should focus on making Open Access-like options more attractive to scientists.

PMR: This is a purely US argument which is almost incomprehensible on this side of the Atlantic and probably almost everywhere else. No one likes paying taxes, but we accept that government tries to spend them wisely. It  [the argument] is epitomized in Rudy Baum's "Socialized Science" and More Socialized Science articles.

The word socialize means:

1. socialise - take part in social activities; interact with others;
2. socialise - train for a social environment; "The children must be properly socialized"
3. socialise - prepare for social life; "Children have to be socialized in school"
4. socialise - make conform to socialist ideas and philosophies; "Health care should be socialized!"

Meaning 4 (presumably Rudy's usage) is - I think - entirely unknown outside the US. When I used the apparent synonym "socialist" Rudy corrected me. I therefore have no idea what the word means other than that it seems to be pejorative. There is clearly a strong US-only political undercurrent which we outsiders should not try to swim in.

To finish: Open Access enthusiasts are working very hard to create attractive options. A major part of this ("the tools") are new publishers and organs.It takes ca. 5 years for a new conventional journal to achieve serious impact factors and a number of these have and are being launched. I expect that, like OUP and BMC Bioinformatics, we shall see many of the new ones prosper.

What I really fear is the growth of "hybrid horrors". This is where the publishers create something which isn't really Open but is covered by such a mass of verbiage that it is almost impossible to work through. I've spent weeks earlier this year trying to uncover publisher policies and in some cases failing. When I do find out what is happening it is heavily publisher-specific and often not even implemented as they say it is. So I expect to see a continued stream of "slightly-Open" offerings trumpeted as NIH-compliant. This requires heavy work to investigate and police - work which is entirely unproductive and usually unfunded.

The great advantage of the  requirement to deposit in Pubmed (rather than simply to expose on a publisher or other website) is that the act is clear. You can't "half-deposit" in Pubmed. They have the resources to decide whether any copyright statement allows the appropriate use of the information or is suffiently restrrictive that it does not meet the NIH rules.

At some stage the community will get tired of the continual drain on innovation set by the current approach to publihing. Whether when that happens many publishers will be left is unclear.

Thank you President Bush

From Peter Suber:

OA mandate at NIH now law

This morning President Bush signed the omnibus spending bill requiring the US National Institutes of Health (NIH) to mandate OA for NIH-funded research.  

Here's the language that just became law:

The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine's PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.

PMR: We Can now celebrate.

The hard work continues. But now all fulltext derived from NIH work will be available on PubMed. Other funders will follow suit (if they are not ahead). So our journal-eating-robot OSCAR will have huge amounts of text to mine.

The good news is that we believe that this text-mining will, in itself, uncover new science. How much we don't know, but we hope it's significant. And if so, that will be a further argument for freeing the fulltext of every science publication.

Update on Open crystallography

There's now a growing movement to publishing crystallography directly into the Open. Several threads include:

... so it was no great surprise when Jean Claude blogged:

X-Ray Crystallography Collaborator

20:41 20/12/2007, Jean-Claude Bradley, Useful Chemistry

We have another collaborator who is comfortable with working openly: Matthias Zeller from Youngstown State University.

With the fastest turnaround for any crystal structure analysis I've ever submitted, we now have the structure for the Ugi product UC-150D. For a nice picture of the crystals see here.

PMR: J-C also mailed us and asked how w/he could archive and disseminate the crystallography. So here's a rough overview.

Crystallography is a microcosm of chemistry and we encounter many different challenges:

  • not all structures are Open (some not initially, some never). Managing the differential access is harder than it looks. It has to be owned by the Department or Institution. So you probably need access control, and probably an embargo system.
  • Institutional repositories are not generally oriented towards data. Some may, indeed, only accept "fulltext". So there may be nowhere obvious to go.
  • The raw data (CIF) contains metadata, but not in a form where search engines can find it. That's a important part of what SPECTRa does - extracts metadata and repurposes it.
  • The CIF can, but almost universally does not, contain chemical metadata. So part of JUMBO is devoted to trying to extract chemistry out of atomic positions.  Needs a fair amount of heuristic code.

So in conjunction with eChemistry and eCrystals and in the momentum of SPECTRa we are continuing to develop software for crystallographic repositories. There are several reasons why people want such repositories:

  • as a high-quality lab companion - somewhere to put your data and get it back later.
  • as somewhere to provide knowledge for data-driven science (e.g. CrystalEye)
  • as somewhere to save your data for publication and dissemination
  • as somewhere to archive your data for posterity (e.g. an IR)

These put different stresses on the software, so Jim and I are developing context-independent tools that can be used in any. I'm hacking the JUMBO software (CrystalTool) and he is hacking CrystalEye so it becomes a true repository.

This is our relaxation over the holiday.


the end of the beginning

I got a series of euphoric messages from fellow OA activists rejoicing at the news that Preseident Bush was "certain" to sign the House appropriations bill. I searched for the message in Peter Suber's blog and found ...

Congress sends revised spending bill, and OA mandate for NIH, to President

This evening the House of Representatives passed an omnibus spending bill containing language requiring the NIH to adopt an OA mandate.  The Senate passed the bill on Tuesday.

Because it cuts spending to the levels President Bush requested, and gives him $70 billion for the war in Iraq and Afghanistan, he is expected to sign it.

The OA mandate for the NIH isn't law yet, but it's very, very close. Watch this space.

PMR: I am watching this space ... and Alma Swan writes:

>The Appropriations Bill, with the language in about the NIH mandate, passed

>in the US Senate last night. It now *will* be signed off by President Bush.


>Heather deserves huge congratulations. This has been virtually a

>one-woman-led effort, and she has fought the publishers all the way, in

>every corridor and in every committee room. Now to try to emulate her in

>Brussels ...

PMR: Absolutely total congratulations to Heather. I don't know enough about whether presidential signatures are deterministic so will wait a few more days before breaking open any bottles.

And we should remember that the struggle continues.

"Now this is not the end. It is not even the beginning of the end. but it is, perhaps, the end of the beginning."

and why should I choose this quotation?

Open Data: publishers are the problem

The Chemspider site and blog have been making rapid and valuable progress towards Open Data. This is particularly laudable for a commercial site where Openness in chemistry is a long way from being a proven business model and is actively resisted by many. Here is a typical tale of frustration - I comment below
Why We Can’t Publish Scraped CrystalEye Data Yet….And Science Commons Declare a Protocol for Implementing Open Access Data
Previously I blogged about our intention to scrape CrystalEye data and publish onto ChemSpider. The original comments regarding the data on CrystalEye were as follows:

  1. pm286 Says:
    October 26th, 2007 at 7:54 am (1) All data come from Free sources - i.e. visible without a subscription. Some journals (Acta Crystallographica and RSC for example) do not copyright the data. Others like ACS add copyright notices. It is our contention, and Elsevier has agreed for its own material, that facts are not copyrightable. We have therefore extracted and transformed facts and mounted these. Where the original material (CIF) does not carry copyright we mount it on our pages - where it does we do not, but we have the transformed data. In those cases it would be possible to recreate the original CIF data in semantic form ,but not the exact typographical layout which contains meaningless whitespace.I am not aware that ACS or Elsevier have ever made statements of any kind about our Open Data efforts.You may scrape anything, must you must honour the source and the metadata and you should add the Open Data sticker. If you scrape the link (simplest) you may simpy point to our site. If you scrape more data you should ensure that the integrity of the data is maintined and that if it is re-used the re-used data should still clearly show our metadata.

[PMR: Yesterday's announcement of the CCZero licence could mean that we change from a meta-licence ("Open Data") to an explicit CCZero licence. I will need to read the details. I don't think it changes the arguments below.]

We have already done the work to scrape certain data from the site but have chosen to be extra careful with taking the declaration of Open Data made to all data sources. My primary worry was with the data scraped from the ACS journals. With this caution in mind I sent a letter to the copyright department at ACS as outlined here. In fact I made a couple of phone calls, sent the email about 2 more times and finally managed to talk to a nice gentleman from the ACS copyright department and brought my concerns to light. Since then we have exchanged multiple emails, spoken again on the phone and I have been told that a meeting of minds from both Washington and Ohio was being scheduled to discuss the situation. That’s 2 months after my original email.

Today I received the following email and I am excerpting from it..

“Thank you for your inquiry about the proposed use by ChemSpider of information in the CrystalEye database that has been published within certain ACS journal publications. In light of your query, we are examining the manner in which ACS published material is represented within that database as well as the nature of your proposed use, so that we can respond in an informed manner to your request.


If you will be attending the ACS National Meeting in New Orleans, perhaps we could confer with you at that time to discuss our findings and advise you appropriately?

Communicators Name withheld ”

What I thought was a simple question and done with the intention that ChemSpider was safe turns out not to be so simple. It could take until March 2008 to get an answer! At this stage we will not be publishing any of the CrystalEye data without confirmation from each of the publishers that this is allowed. I asked the question previously “Who gets to declare data open or not?“ and even received the question “Why even offer the option of closed?” The primary reason is that we have turbulent times ahead of us around such issues of “openness” and until these are navigated I am working to keep ChemSpider “safe “. I am willing to participate, support and contribute to the evangelism of openness but am equally concerned with keeping ChemSpider alive for the close to 3000 users per day now accessing the service.

It was an interesting day to receive this email about a potential FIVE MONTH delay to a decision about Open Data especially now that Science Commons have released a Protocol for Implementing Open Access Data just yesterday. ...

So, while protocols are exposed to the community by Science Commons the challenge of utilizing them now begins…I will be in communication with members of the Science Commons soon to determine how ChemSpider can it into the model…

PMR: This is, unfortunately, completely typical. Earlier this year I wrote to Tetrahedron (an Elsevier journal) asking if they would consider posting CIFs (crystallographic data):

Request for Open publication of crystallographic data in Elsevier’s Tetrahedron

=========== Open letter to editors of Tetrahedron ==========

Professor L. Ghosez ,
Professor Lin Guo-Qiang ,
Professor T. Lectka ,
Professor S.F. Martin ,
Professor W.B. Motherwell ,
Professor R.J.K. Taylor ,
Professor K. Tomioka

Subj: Request for Open publication of crystallographic data in Tetrahedron
Dear editors,
I have recently been reviewing access to supplemental data in chemistry publications, in particular crystallographic data (”CIFs”). Many publishers (IUCr, RSC, ACS…) expose these on their websites as Open Data (for examples see: The data are acknowledged not to be copyrightable (see where your colleague Jennifer Jones (copied) has confirmed:

Dear Peter Murray-Rust
Thanks for your email. Data is not copyrighted. If you are reusing the entire presentation of the data, then you have to seek permission, otherwise, you can use the data without seeking our permission.
Yours sincerely
Jennifer Jones
Rights Assistant
Global Rights Department
Elsevier Ltd
PO Box 800
Oxford OX5 1GB
Tel: + 44 (1) 865 843830
Fax: +44 (1) 865 853333

Other Elsevier journals such as those publishing thermochemistry (see last blog post) are now actively making the supplemental data Openly available on the journal website. I am therefore asking whether Tetrahedron (and perhaps other Elsevier chemistry journals) might consider publishing their data Openly in this way and would be grateful for your views.

(This is an Open letter ( and I would like to publish your reply so please mark any confidential material as such).

Thank you for considering this

PMR: Five editors - I haven't had the courtesy of a reply. This is not uncommon - I didn't get replies on Open topics from Wiley, Springer (first time round) either. Either journals are not in the habit of replying - they consider ordinary scientists too low in the foodchain to merit consideration (most likely) - or they regard anything Open as a pain and want to slow it by inaction (also most likely). They have their set way of doing things - God ordained in 1972 that the world belongs to the publishers and they don't want to see it change.

Another typical example. I was invited to write an article for Serials Review on Open Data. I asked if I could write my artcile in HTML and embed my own copyright material, noted as such under appropriate licence. The editorial office siad that would come back to me. It's now past the closing date of the submission. After ca. 6 weeks I got the reply:

Facts and data are not copyrightable but the expression of data is

copyrightable. If you wish to use third-party data in a different

format within your article, including full acknowledgement to the source

of the data, then that would be acceptable. However, if you wish to

retain the expression of the data, then you will need to include

alternate diagrams within the article.

So I can use the data - IF I can get it. If I can only get a graph then I can't unless I redraw it. Is redrawing a graph a useful activity for science - do I need to answer? The only value is that it adds some random errors to the data (or systematic ones) that would be fun to give as exercises in bad scientific practice for students. "Expression of the data" - i.e. the author's graphs - are not re-usable.

So what's the answer? Currently I use the "ask forgiveness, not ask permission" mode. And if the "owners" ot the data (read "appropriators") send the lawyers and ask for a take-down - make a huge public fuss. As the world did when Shelly Batts "stole" a graph from from Wiley (Sued for 10 Data Points). And Wiley backed down. The publishers don't like public fuss.

So a few months ago I would have advised Chemspider "go ahead". But they ran foul of another publisher (I think it was the Royal Society of Chemistry). I never understood the details but Chemspider linked to publicly visible papers (not Open) and were asked to take the links out of the Chemspider database. This doesn't even seem to make sense. I would have thought publishers would like people linking to their papers - maybe it was the metadata.

So I appreciate Chemspider's wish to remain on the correct legal side of the publisher. But [the publishers'] actions destroy scientific data in the current century. Chemistry publishers [OA publishers and IUCr excepted] are actively and passively resisting the re-use of data. They copyright factual data, hide it, require take-downs, refuse to reply to reasonable letters - everything. They are simply in the way between the creator of the data and the consumer

As I have blogged we now have an exciting project sponsored by Microsoft on eChemistry. We are going to fill repositories with data. And we are going to get that data ("not copyrightable" - see above) from any source we reasonably can. It will be available to the whole world. It will probably be stamped CCZero. CrystalEye will be in there. We shall, of course, include the source (provenance) as we really care about it and metadata. So people will know where it came from.

Why can't the ACS reply "Yes" to Chemspider by return? Does it really make sense for chemistry publishers to be universally seen as Luddites? Because the world will sweep these restrictive practices away, and the business will have moved from the publishers to somewhere in the twenty-first century (the one we are in).

Open Access Data, Open Data Commons PDDL and CCZero

This is great news. We now have a widely agreed protocol for Open Data, channeled through Science Commons but with great input for several sources including Talis, and the Open Knowledge Foundation. Here is the OKFN report (I also got a mail from Paul Miller or Talis without a clear link to a webpage).


This means that the vast majority of scientists can simply add CCZero to their data. I shall do this from now on. Although I am sure that there will be edge cases it shouldn't apply to ANYTHING in chemistry.

Good news for open data: Protocol for Implementing Open Access Data, Open Data Commons PDDL and CCZero

15:21 17/12/2007, Jonathan Gray, external, news, okf, open access, open data, open geodata, open knowledge definition, Open Knowledge Foundation Weblog

Last night Science Commons announced the release of the Protocol for Implementing Open Access Data:

The Protocol is a method for ensuring that scientific databases can be legally integrated with one another. The Protocol is built on the public domain status of data in many countries (including the United States) and provides legal certainty to both data deposit and data use. The protocol is not a license or legal tool in itself, but instead a methodology for a) creating such legal tools and b) marking data already in the public domain for machine-assisted discovery.

As well as working closely with the Open Knowledge Foundation, Talis and Jordan Hatcher, Science Commons have spent the last year consulting widely with international geospatial and biodiversity scientific communities. They’ve also made sure that the protocol is conformant with the Open Knowledge Definition:

We are also pleased to announce that the Open Knowledge Foundation has certified the Protocol as conforming to the Open Knowledge Definition. We think it’s important to avoid legal fragmentation at the early stages, and that one way to avoid that fragmentation is to work with the existing thought leaders like the OKF.

Also, Jordan Hatcher has just released a draft of the Public Domain Dedication & Licence (PDDL) and an accompanying document on open data community norms. This is also conformant with the Open Knowledge Definition:

The current draft PDDL is compliant with the newly released Science Commons draft protocol for the “Open Access Data Mark” and with the Open Knowledge Foundation’s Open Definition.

Furthermore Creative Commons have recently made public a new protocol called CCZero which will be released in January. CCZero will allow people:

(a) ASSERT that a workhas no legal restrictions attached to it, OR
(b) WAIVE any rights associated with a work so it has not legal restrictions attached to it,
(c) “SIGN” the assertion or waiver.

All of this is fantastic news for open data!

Deepak Singh: Educating people about data ownership

Deepak Singh: Educating people about data ownership

I never got to watch the Bubble 2.0 video (I only heard it on net@nite). Before I could get to see it, it got taken down. Wired talks about the reasons behind the takedown. As a content producer who shares content online and as a scientist who has published papers and a not-so-casual observer of the entire content ownership debate, I am often torn by examples like this one.

What is important for the author? Is it monetary compensation? If content, scientific, media or otherwise is your primary source of income, you can understand why people get a little antsy when someone uses the content without permission. I know too many people, journalists, musicians, etc for whom their creativity is the sole source of income and they are all well meaning, even if they don’t always understand the environment that they operate in.

However, a lot of these issues date back to a world free of Creative Commons, which I believe is celebrating a 5th birthday this weekend. In today’s climate we have choice, so to some extent content owners need to make that choice and then live with their consequences. You can choose to publish your papers in a PLoS journal under a CC license, or you can choose to publish in a closed journal. Obviously, I belong to the open science camp, but I also believe that people have the choice of making decisions. They then must also live with the consequences of those decisions.

What we need is education. When Larry Lessig spoke at the University of Washington recently (I have the full recording if anyone is interested), I asked him a question on this very issue. How many people who upload pictures to flickr really understand the licensing options available to them? How many people understand the pros/cons and implications? Most scientists I know don’t even know what Creative Commons is, Science Commons even less so. On the flip side, do the majority of people wanting to use pictures, etc understand what they can do with media, the proper ways of attribution, etc? I doubt it. Even I am not always sure.

We have a plethora of resources available to us for sharing data, media and information. Scientists have the PLoS and BMC journals. You have resources to share data, documents, pictures, videos, screencasts, etc etc. It is up to us to decide where we put our information and how it is managed. It is also important for everyone to understand and respect those choices. The dialog on what is the best approach to sharing data and the advantages of open data can be discussed as we go along.

PMR: We have to liberate scientific images unless there is a good reason why not. There will continue to be problematic areas when re-use is mis-use. For example CC-BY would allow derivative works including - say - altering the gray scale or the pixels in an image. (I hope no-one would edit in an incorrect scale bar!) And it's important to keep the caption with the image - until we get better metadata packaging. But, in general all scientific images should be stamped CC-BY or SC. Scientific images are different from people's photographs. They are part of the scientific record. And they should NOT belong to the publisher.


Last spring I visited Illinois (UIUC) and presented the SPECTRa tools. Scott Wilson who runs the crystallographic facility and many of the LIS community were keen to see how it could be used for capturing their crystallography. Yesterday I met Sarah Shreeve at the DCC conference and she told me that they had now budgeted to install a SPECTRa system. This is great - Jim Downing and I will be discussing the technical details - but we'll be hoping to have some more news RSN.

If anyone else at DCC is interested in SPECTRa  for ingesting crystallography, spectroscopy or compchem, catch me at coffee - I'm around till Saturday.

Microsoft eChemistry Project and molecular repositories

Some of you may have picked up from - e.g. the Open Grid Forum - that Microsoft (Tony Hey, Lee Dirks, Savas Parastatidis) have been collaborating with Carl Lagoze (Cornell) and Herbert van de Sompel (LANL) on bringing together Chemistry and OAI-ORE - the next generation of interoperable repository software. We are delighted that Microsoft has now agreed to fund this project and when Carl, Lee, Simon Coles (Soton) and I had lunch yesterday Lee said I could publicly blog this. (There are contractual details to be settled on various sites).

In brief - Tony Hey was the architect of the UK eScience program and then moved to Microsoft Redmond where he has been developing approaches to Open Science (not sure if this is the correct term but it gives the idea) - for example it includes Open Access and permits/encourages Open Source in the project. Carl and Herbert developed the OAI-PMH protocol for repositories which allows exposure of metadata for harvesters. They have now developed ORE - Object Re-use and Exchange - which sees the future as composed of a large number of interoperating repositories rather than monolithic databases (I am on the advisory board of ORE).

There are 7-8 partmers in the program - MS, PubChem, Cornell, LANL, Lee Giles (PSU), Soton, Indiana and Cambridge. This is a really exciting development as we shall be able to create a number of well-populated molecular repositories with heterogeneous content (everything from crystallography to Wikipedia chemicals for example). One that we are currently developing is an RDF/CML-based repository of common chemicals - perhaps 5000 - which could serve as an amanuensis for the bench chemist or undergraduate needing reference material. CrystalEye will be in there as well and we shall also be "scraping" (ugly word) any material we can legally access. In this was we can hope to see the concept of World Wide Molecular Matrix start to emerge. Chemistry eTheses can also be reposited - we are starting to hear of universities who have mandated open theses.

Chemical substructure searching across repositories will be an exciting challenge but we have a number of ideas.

We shall have openings here so if you are interested let us know.

More later, but to reiterate our thanks to Tony and colleagues.

Scope for SCOAP

From Peter Suber: SCOAP3 FAQ for US libraries : CERN's SCOAP3 project has created an FAQ for U.S. Libraries. Excerpt:


What is SCOAP3 and what does it have to do with me?
SCOAP3 is the Sponsoring Consortium for Open Access Publishing in Particle Physics (see [this] for more info). It is a mechanism for a field of science (in this case Particle Physics) to pay for its own publishing costs, rather than make the readers of its journals pay via subscriptions. In the SCOAP3 model, everyone involved in producing the literature of particle physics (universities, labs, and funding agencies) pays into a consortium (SCOAP3) which then pays publishers so that all articles in the field are Open Access.
No particle physics journal will have a subscription cost, and everyone can read any article published.
You can redirect the money that you save on subscriptions to SCOAP3 to pay for Open Access for the entire literature of Particle Physics.
As a physics/science library you will be realizing the savings from the lack of subscription costs for the Particle Physics journals, so it is only natural that you would be a contributor to SCOAP3. Clearly the cost of Open Access will be similar to the cost of subscriptions, because there won't be any new money in the system. Without your redirected money, it won't work....

PMR:This is a great model and should work well for any large, coherent, well-managed and funded community. In reality there are probably only a few fields where it works - they need to be collaborative, global and probably specialist.


"Clearly the cost of Open Access will be similar to the cost of subscriptions, because there won't be any new money in the system. Without your redirected money, it won't work...."


PMR: I don't agree. We don't know what the cost of publishing actually is, but it's clear that it varies widely and there is much misinformation. The fact that many Open Access society journals are author-doesn't-pay shows that in certain cases the costs can be accommodated in "marginal costs" or other subsidies. It can be argued that commercial publishers are more cost-efficient than non-profits because they are commercial. But they have many other costs - copyright police, marketing, and perhaps production to layout standards which the community does not require. And there is the shareholder profit.

A major part of the current pricing problem is that price and cost are not seen to be related.


So would the following be a more accurate statement?



"In the case of SCOAP the cost of Open Access is not zero, but we shall be open about the expenditure. We expect to avoid some of the costs and profits of commercial and society publishers and would hope to be able to lower costs. Since price (of author submission) is now directly related to costs we must recover them from the funders because there won't be any new money in the system. Without your redirected money, it won't work...."