AHM2007: Best paper (Jon Blower) – Virtual globes Hurricanes and penguins

Jon Blower was awarded the best paper at AHM2007 . This is an an outstanding example of escience where SIMPLE technology is brought to bear on multiple datasets, each of which by themselves does not carry a message but the combination does (http://www.resc.rdg.ac.uk/publications/Blower_et_al_Virtual_Globes_final.pdf)

Virtual globe technology holds many exciting possibilities for environmental science. These easy-to-use, intuitive systems provide means for simultaneously visualizing four-dimensional environmental data from many different sources, enabling the generation of new hypotheses and driving greater understanding of the Earth system. Through the use of simple markup languages, scientists can publish and consume data in interoperable formats without the need for technical assistance. In this paper we give, with examples from our own work, a number of scientific uses for virtual globes, demonstrating their particular advantages. We explain how we have used Web Services to connect virtual globes with diverse data sources and enable more sophisticated usage such as data analysis and collaborative visualization. We also discuss the current limitations of the technology, with particular regard to the visualization of subsurface data and vertical sections.

JOn showed some stunning slides and animations, which had the theme of combining datasets
He showed Keyhole Markup Language (KML) which supports simple geographic features – points, lines, polygons, etc.  Successful because it’s NOT trying to do too much. It enables the mashups between the datasets – the common frame of reference. And it, together with the software is all Open (unlike the Google Earth mashup approach).
Hurricane Katrina – satellite meterology mashed with hurricane intensity showed unexpected Sea cooliing  which was critical to understanding the effect of hurricanes in mixinf hot and cold sea.
A mashup of penguin tracks (through radio transmitters) with satellite chlorophyll showed that the penguins circulated round areas of high chlorophyll – presumably in the ocean (?).
The message is that we need open data, open standards and code, simple, universal technology for visualisation.
Critical to fund the data exploration area.
So did the recent Hurricane Felix cause the sea to cool? Apparently much less than Katrina from his movie. But this is real eScience commenting on today’s events of world importance.

Posted in ahm2007, open issues, XML | Leave a comment

PRISM: Nature distances itself

I have been concerned about the membership of PRISM and, specifically, UK and other European publishers who might be associated, perhaps incorrectly, with the initiative. I hadn’t got around to writing to Nature Publishing Group (who have been an enthusiastic and vauable sponsor and collaborator in our work). Now Timo Hannay has volunteered Nature’s non-involvement with PRISM. This is a long and valuable post and I shall return to some of the other issues later:

PRISM: Publishers’ and Researchers’ Intensifying Sense of Mistrust

For anyone who’s interested here is Nature Publishing Group’s (NPG’s) take on PRISM: Although Nature America is a member of the AAP, we are not involved in PRISM and we have not been consulted about it. NPG has supported self-archiving in various ways (from submitting manuscripts to PubMed Central on behalf of our authors to establishing Nature Precedings), and our policies are already compliant with the proposed NIH mandate.Those are facts. What follows is just my personal opinion.
PRISM has understandably provoked a great deal of anger among those scientists who care about how the fruits of research are communicated. (In this sense, PRISM has achieved the exact opposite of dog-whistle politics: the only people to sit up and take notice have been those who were outraged by it. Nice work, guys.) My main emotion, however, is closer to bewilderment. Do PRISM’s proponents (whoever they are) really think that their approach will do anyone, including themselves, any good? It’s tempting to suggest that they are out of touch (e.g., with the ways in which technology is changing science and scientific communication), but it’s equally possible that I’m out of touch (e.g., with Beltway politics), so I guess all I can conclude is that they inhabit a different universe to the one I’m in. Time, perhaps, to move on and get back to work.Except that PRISM — and the reaction to it — is having one particularly insidious consequence.The things that I find most ill advised about PRISM are the needless belligerence of the message, the crude them-and-us stance, and the distortion of complex issues into unrecognisable caricatures. I wouldn’t mind so much if the issues themselves were inconsequential, but they’re not. Questions about how scientific communication should be funded, and what roles government should or should not play, are central to scientific progress. If we can’t discuss these in a well-informed, grown-up way then science itself will suffer.
It therefore troubled me that the initial counterattacks on PRISM were themselves often lacking in nuance and discrimination. Given the high emotion generated, this was understandable, but that’s not the same as saying it was correct or helpful. The most general error has been to lump all publishers together in declaring them “evil”, “afraid”, “money-grabbing”, and so on. True, PRISM seems to have come out of the AAP, which is a publishing industry body, but right from the beginning (when I also didn’t have a clue what was going on) it was fairly clear to anyone who cared to make the distinction that PRISM was not the same as the AAP.
To treat the industry as one amorphous lump is a continuation of the kind of misunderstanding that leads people to group together “Nature, Science and Cell” when making comments about scientific publishing. This is a pet hate of mine. If you’re wondering where to send your red-hot molecular biology paper then it’s OK to talk about those three journals in the same breath. But if you’re talking about publishing then you’d better think again: there are hardly three more different organisations on the face of the earth than NPG, the AAAS and Elsevier (the three publishers in question).

[… second half to be discussed later …]

PMR: This is very valuable. I shan’t debate it – except to note that a major part of the problem – and confusion – is the complete lack of coherent communication from any, some, or all of the publishing community. Timo’s is only the second/third substantial contribution from any “main” publisher. All others have remained silent. Given that the prime business of publishers is communication it is enormously difficult to get any coherent response. I shall return to this later, but many of the emotions that arise on this list is because publishers simply ignore the issues.
This blog tries to be fair. Yes, I was upset by PRISM, but I hope I kept fairly cool. But in defence of the blogosphere the PRISM message “open access [equals] junk science” – is a simple factually incorrect insult. If PRISM had conducted a debate – of any sort – the emotions could have been avoided. For example – a hypothetical dialogue:

  • PRISM: Open access leads to worse science.
  • OA-advocate: Please give me evidence…
  • PRISM: A study by X showed that there were proportionally more retractions in OA papers than TA
  • OA: But Y refuted this…
  • PRISM: But Z showed that Y’s data were too limited

This is the sort of debate I would hope to see the scholarly community indulging in. But PRISM – and almost all publishers (except Nature, OUP, RUP and ColdSpring) are not communicating.

Posted in open issues | Leave a comment

DBPedia2: major opportunity for semantic web (including chemistry)

I have blogged about the exciting potential of DBPedia before ( dbchem” href=”http://wwmm.ch.cam.ac.uk/blogs//?p=316″>dbpedia – structured information from Wikipedia => dbchem). It is a semistructured RDF triple collection created automatically from Wikipedia. The really exciting thing is that huge numbers of WPedians have contributed to DBPEdia without even knowing it. Simply by evolving simple community metadata (tagging and infoboxes) the WPedians have created a top-class semantic resource. A WP category of, say, “1997 deaths” gets translated to a triple something like:
:Diana :death_date “1995”^^xsd:date
which says that the object with label “Diana” had a “deathDate” category with value “1995” which is is of type date.
Now the OKFN has blogged

DBpedia recently released the new version of their dataset. The project aims to extract structured information from Wikipedia so that this can be queried like a database. On their blog they say:

The renewed DBpedia dataset describes 1,950,000 “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages and 440,000 external links into other RDF datasets. Altogether, the DBpedia dataset now consists of around 103 million RDF triples.

As well as improving the quality of the data, the new release includes coordinates for geographical locations and a new classificatory schema based on Wordnet synonym sets. It is also extensively linked with many other open datasets, including: “Geonames, Musicbrainz, WordNet, World Factbook, EuroStat, Book Mashup, DBLP Bibliography and Project Gutenberg datasets”.
This is probably one of the largest open data projects currently out there – and it looks like they have done an excellent job at integrating structured data from Wikipedia with data from other sources. (For more on this see the W3C SWEO Linking Open Data project – which exists precisely in order to link more or less open datasets together.)

PMR: DBPedia1 was mindblowing, but – not surprisingly – suffered from inconsistency and incompleteness. For example there were several RDF predicates for deathDate “death_date”, “deathdate”, etc. This is entirely forgivable for a first try. As DBPedia awareness spreads through WPedians they will converge on how infoboxes are created to give maximum semantic value. It only needs one or two evangelists in a discipline – e.g. in chemistry – to work this out, show the value, and then popularise it. The main body of WPedians will then adopt these methods and rapidly create a coherent semantic hyper-object.
The exciting thing is that this is zero-cost.
This will revolutionise reference chemistry. We have recently shown – and will be demoing at AHM2007 – how we can extract semantic chemistry from eTheses. That means that any student writing a thesis can increasingly link – painlessly – to WPedia for their lighweight ontological resource. Authors will know they are using the terms correctly – readers will know what the terms mean – and much more.
So I predict that with a few years DBPedia will become the semantic resource for chemistry. Every entry in WPedia enhances it – you never go backwards. We’ll be able to combine fundamental information for compounds such as colour, melting point, density, etc. There will be enough semantic data that a machine could rediscover the periodic table.
And that’s just the start. So, I’ll be browsing DBPedia in the blank spaces at AHM2007.

Posted in ahm2007, semanticWeb | 6 Comments

UK eScience All Hands 2007

I’m at the UK eScience All Hands Meeting – the sixth – and I think I have been to all. The meeting is closely, but not completely, coupled to the UK’s pioneering investment in eScience (roughly equivalent US term is cyberinfrastrucrure). I’m listening to the keynote – Malcolm Atkinson – :

  • research using eScience
  • research enabling eScience
  • eInfrastructure supporting research and innovation

He highlights Computational Thinking (Jeanette Wing) which will be my reading during (perish the thought) any boring talks: http://www.cs.cmu.edu/~wing/publications/Wing06.pdf
Cameron Naylor has blogged:


If it hasn’t been obvious from what has gone previously I am fairly new to the whole E-science world. I am definitely not in any form a computer scientists. I’m not a computer-phobe either but my skills are pretty limited. It’s therefore a little daunting to be going for the first time to an e-science meeting. This is the usual story of not really knowing the people from this community and not necessarily having a clear idea of what people within the field or community think the priorities are.The programme is available online and my first response on looking at it in detail was that I don’t even understand what most of the session titles mean. “OMII-UK” is a fairly inpenetrable workshop title for which the first talk is “Portalization Process for the Access Grid”. Now to be fair these are somewhat more specialised workshops and many of the plenary session names make more sense. This is normal when you go to an out-of-your-field conference but it will be interesting to see how much of the programme makes sense.
PMR: Don’t panic. There will be a lot of technology that is not familiar. Not all is relevant to you. The people are often more important than the technology.
One of the issues with e-science programmes is the process of bringing the ‘outside’ scientist into the fold. Systems such as our lab e-notebook require an extra effort to use, certainly at the beginning, and during the development process there are often very few tangible benefits. Researchers are always time poor people so they want to see benefits. In theory we are here to demonstrate and promote our e-notebook system but I suspect this may be a case of preaching to the converted. It will be interesting to see a) whether we get much interest b) whether the comments we get are more on the technical implementation or the practical side of actually using it to record experiments.One of the great things about starting this blog has been the way it has facilitated discussion with others interested in open notebook science and open science in general. I am less sure it has brought scientists who are interested in the work in our notebook in. My feeling is that this meeting may be a bit similar. On the other hand it may get us some good ideas on solving some of the problems of visualising the notebook that I want to discuss in a future post.
So if you are at the meeting and want to see the notebook please drop by to the BBSRC booth on Wednesday afternoon and do say hello if you see a shortish balding bearded guy who is looking lost or confused.
PMR: There is a tension between the needs of “scientists” and the desires and directions of “computer scientists”. Sometimes they overlap – frequently they don’t. A great deal of the technological development takes place because it is needed, but others because it pushes the boundaries of computer science. That’s not a bad thing, unless it dominates. I am continually refreshing my judgement about what it gets right and what it doesn’t. Some disciplines need heavyweight technology, but others like chemistry probably don’t. But using existing lightweight technology is not sexy, and doesn’t engage many computer scientists.
I’m tagging this as ahm2007. I could only find 2 tags in Technorati. Compare to www2007 where there were hundreds of posts. So any bloggers might congregate round this tag.
Posted in ahm2007 | 3 Comments

statement: why I wrote to Cambridge UP and Oxford UP

I received two emails today – independently – from press organizations / topical publications along the lines of
“I am writing an article about AAP/PRISM and would like to know why you oppose it and wrote to CUP”. As I am away – at UK eScience AllHands – and not always in phone contact I have prepared a simple statement for the press and others from which anyone can quote.
=================
The most definitive criticism of PRISM is to be found on Peter Suber’s blog where the arguments are very carefully laid out. http://www.earlham.edu/~peters/fos/2007_08_19_fosblogarchive.html#365179758119288416
I subscribe to everything Peter has said – he takes great care both to be accurate and comprehensive. I would strongly suggest you read his comments – if you haven’t already done so. Most of what has been written since is either comments from others, or collations of these comments. There is clearly great concern in the community about PRISM, and there has been essentially no traffic defending it. (If there had been Peter Suber and others would have reported it – we try hard to be objective). Certainly I have seen no attempt to challenge Peter Suber’s many points.
My particular concern is that is unclear exactly who PRISM are. They are an initiative of the AAP but, I suspect, not synonymous nor with identical memberships. (You may remember that last year ca 66 members of AAP signed a letter opposing the US government’s S.2695 initiative [on Open Access to federally funded research] and it is possible that there is signatories overlap between these signatories and PRISM – but this is speculation). In engaging in debate – which has so far been unilateral – it is important to find spokespeople for all points of view. There was a suspicion that not all members of the AAP backed PRISM and indeed some started to make public pronouncements distancing themselves from PRISM. I therefore thought it would be useful to find enlightenment by writing to those AAP members with whom I had some connection and might legitimately be given a hearing. I chose Cambridge University Press – being a member of staff in Cambridge – and also OUP where I graduated and am therefore a member.
I have yet to hear from Cambridge, but got an almost immediate reply from OUP distancing themselves. You can read this on:
http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=546
In this post I also expand on my reasons for suggesting that members of AAP may, by default, become associated with support of PRISM and they may wish to consider if this is what they want.
In summary, therefore, this is a first step to find out who is and who is not a member of, or supporter of, PRISM so that we are better able to directly bring the challenges that Peter Suber has set out to their direct view.
Today we gather that PRISM has responded by saying that they will publish a membership list some time in the future. In the interim it will still be useful to try to find out the positions of particular members.

Posted in open issues, Uncategorized | Leave a comment

Copyrighted Data: replies

So far I have had replies from Antony Williams of Chemspider and Steve Bachrach. These are thoughtfull and I’ll comment here:

  1. Antony Williams Says:
    September 10th, 2007 at 3:50 am eI have commented in a recent blog post about your comments regarding copyrighted data extracted from Wiley articles and resold as a spectral collection. http://www.chemspider.com/blog/?p=126Bottom line…I don’t think there is an issue…all data are probably licensed/purchased from appropriate sources. Now, the issue is whether it is appropriate for an organization/individual to extract and compile literature data and commercialize. MANY organizations do this so I think it is appropriate.

PMR: I’ll reply to the main body later, but the argument that “MANY organizations do something, so it’s appropriate” is invalid. Many organizations take my Open Access article, and resell it, some with their own copyright. This is immoral and illegal and increased number makes it worse, not better.
Much of what Antony writes deserves comment:

ChemGate, eMolecules and Discussions About Copyright

Posted by: Antony Williams in UncategorizedCopyright©2007 Antony Williams
Peter Murray-Rust has initiated a very LIVELY conversation about Wiley’s Chemgate rollout in collaboration with eMolecules. …
In that blog Peter posed an interesting question – “Now, Wiley publish a lot of chemistry. And they accompany this with supplemental info – data which they copyright. Do you think if I ask them nicely they will let me aggregate this non-copyrightable data in the same way as we have done for CrystalEye? Please, Wiley, let me know. ”
This is a very interesting opening for a discussion that has concerned me for a while. Specifically, many companies are aggregators of published data which is then commercialized. Most groups who build databases extract from the literature – whether these are databases of PhysChem properties, assigned NMR shifts, toxicology data etc these are commonly extracted from the literature. This has been going on for decades. if the data are copyrighted within the article then is this permissible? The data on NMRShiftDB may exist within articles and be online…permissible? The data on QSAR World are from the literature..permissible? If there are copyright issues with extracting single measured data points (e.g solubility, pKa, logP values) from the literature or even NMR assignments then the number of potential lawsuits is enormous. I don’t believe it’s possible.

PMR: This is a fair summary of the C20 situation. Many organizations extract and aggregate data. Some simply list it, some comment on it, some select or create the “best” (critical) values. All this is highly worthy. And it’s also reasonable that they should be rewarded for their effort.
But they shouldn’t use this to monopolize the use of the data – later.

What would happen now if someone chose to post a structure and a “value” extracted from a paper to the submission system on ChemSpider? Illegal?
Peter then posted a little more asking a direct question about copyrighting. He commented:
“This is an offer to Wiley (or eMolecules) to explain why they feel they are legally and morally allowed to copyright data and resell it. This blog is developing a tradition of offering publishers a chance to put their view in a highly public forum, so I would be grateful of a reply. ”
also “I realise that not all entries in ChemGate come from Wiley journals – some are private contributions, and presumably some are abstracted from competitor journals. But I would be amazed if there were not entries corresponding to Wiley journals.”
I am not aware of the details regarding the spectral collections on ChemGate in terms of copyright so I did a search online and found that the original ChemGate was probably served up by Specinfo technologies and listed the databases on there under their own branding. I believe this is a revenue sharing model with the generators of the data – there is lots of detail online regarding the providers of the data so I won’t bore you with it – however, I don’t believe any of it is extracted from Wiley publications which is what I believe Peter is pointing to. Is there any evidence that any of the data are extracted from Wiley publications? MORE comments below after the list..
[potential suppliers of info snipped…]
I believe that Wolfgang Robien’s spectral collection has likely been added to the list above (but it might actually already be a subset of the Wiley collection and licensed from Robien directly. Peter has already commented recently about NMR data and predictions directly in relation to NMRShiftDB discussions. This discussion is different since I believe this is about a spectral curve collection OR spectra reconstructed from assigned data.
My best judgment is that the data on ChemGate are likely all appropriately copyrighted. Why do I say this? If you visit ChemGate online at http://chemgate.emolecules.com/ you will see the list of NMR nuclei is : 1H, 13C, 11B, 15N, 17O, 19F , 29Si, 31P. if you visit the Modgraph site then you will see the following statement:

PMR: I have no problem about the database creator and owner owning the collection. I do have problems about copyrighting each individual spectrum since I believe that spectra are data – and therefore not copyrightable. IOW if I use a spectrum from one of these collections the copyright issue is not whether I have the right to reproduce (and potentially resell) the spectrum, but whether I am allowed to extract a certain percentage from the collection.

“The main NMRPredict program is supplied with 131,569 C13 records abstracted from the literature by Professor Robien and co-workers at the University of Vienna over the past 25 years. Three optional additional database available are:
[numbers deleted]
If you compare the nuclei in the list above, and the list on ChemGate, as well as look at the bolded statements, then I judge that ChemGate is serving up the Wiley copyrighted collections (licensed or purchased from their collaborators) rather than serving up any copyrighted data from their articles. So, I think Peter can relax about that.

PMR: It may well be that this is currently true – though I’d like to have definitive statements from the parties involved. But there is no guarantee that this won’t happen in the future and every likelihood that it will. The journals – which now increasingly require authors to publish spectra (rightly) should not immediately feed them back into the publishers databases where they are resold to the authors.

However, based on the comments made on ModGraph’s site and bolded above there may be an issue about copyrights. “In the next few years a dramatic expansion of the databases behind NMRPredict can be expected. The journals selected will cover mainly heterocyclic and medicial chemistry in order to give reliable predictions for candidates for drug discovery (Lipinski’s “Rule of Five”). ” It is unlikely that assigned spectra will be extracted only from Wiley publications despite their special relationship :-) .
I DO NOT believe that there is an issue with extracting assigned spectral data and associated structures from any publications. If there is an issue with this I am very interested in having a publisher declaring that here since I know there are parties reading this blog that do exactly that for their business!

PMR: There are grey areas here. If I sit down (or contract out to wage slaves) to transcribe data manually (with quill pen and parchment) and then type it up again then I think no publisher has an issue. If I read the spectrum electronically and put it in a database then I am liable to be pursued by the publisher for breach of rights. So we have the absurb situation that the only way we can get data is to transcribe it by hand. And each user has to do this. What a laughing stock chemistry is to the IT community.

I will be contacting Wiley this week with the request to index the structures and links back to ChemGate here on ChemSpider. Having been involved with creating spectral databases over the years I believe that the pricing for access to over 700,000 spectra is actually very good. Academic prices are likely lower than those listed online.

PMR: 35,000 USD per year is prohibitive. Even if we could afford it we wouldn’t be able to do anything useful other than look at the spectra because the rights would forbid us to redictribute enhanced data.

Question for all… we are presently accepting both spectra and, shortly, structures and associated data onto ChemSpider. Question for the readers – what is the preferred Creative Commons license you would like to see attributed to the user uploaded data?

PMR: The only one that makes sense is CC-BY. Attribute the author and protect their rights. Allow anyone to re-use the data for whatever purpose including commercial. Unfortunately this doesn’t prevent the aggregation and claims to ownership. So, for example, if I publish 100,000 data and an organization integrates it into its own collection it can claim ownership to the collection. That may make it very difficult for the original author to make sure their rights are fully protected.
I think you for this genuine offer to explore this. I would suggest you contact John Willbanks at Science Commons to see if they have suggestions

Peter commented on his post “I had offered eMolecules 250 000 MOPAC calculations as Open Data.” Peter, we’d welcome the opportunity to host your data for everyone.

PMR: Thanks for the offer. However it requires the attachment of Creative Commons licenses to each entry, and therefore to that data for any molecule in your collection. A molecule could carry both CC data and non-CC data and the system would have to be able to cope.
More generally it is difficult to guarantee the continued Openness of the content – I don’t doubt your sincerity but if the company gets sold in the future I cannot necessarily trust that the licences will continue to be honoured (I have 3 examples from last week – not to do with Chemspider – where my licence on an article wasn’t honoured). The general feeling in chemistry seems to be that licences matter to publishers and aggregators but not to authors.
Having said that the main current barrier is that the molecules are all in DSPACE and it isn’t trivial to get them out. I am hoping that we get an opportunity to put them in a more data-friendly repository soon.
=======================
Now to answer Steve Bachrach in a recent comment:
  1. Steven Bachrach Says:
    September 9th, 2007 at 11:13 pm ePeter,
    I think you need to be a bit careful here. Copyright does serve a purpose, protecting the intellectual property of the creator. What I think we want to see are those rights protected when the authors both wishes to protect his/her rights and when creative work has been produced.
    Now data cannot be copyrighted. However, the presentation of data can be copyrighted. So for example, the 3-d coordinates of a molecule are in the public domain. However, the image I create of this molecule – it’s orientation, projection, color scheme, labeling, etc. results from a creative act and is protected by copyright, as far as I know (And of course I am not a lawyer, these are just my opinions based on discussions with many publishers and people in the industry.) So, in that same vein, the drawing of a reaction scheme is also copyrightable, but not what are the reactants, products, catalysts, conditions. So, it is my belief that copyright law actually does protect the images in figures S1 and S2 above. For spectra, the absorbance at a frequency is data, as is a collection of absorptions across a frequency range. However, how you present that data is your creative act, and so can be protected by copyright. The fact that a machine creates that curve is really irrelevant. Someone had to select the scales, the width of the lines, the font (even if these were all default values – the person using the spectrometer made that choice!)
PMR: I am sad to see this view. It may be correct, it might even be held in court. But I’d like us to fight against it. It says, effectively, that anything that is not numbers is copyrightable. A molecular structure could be held to be copyright because of the fonts that have been used, etc. I take a different view – that a spectrum is a universally agreed representation of the data, and that what units are used for what scales, what the maximum values were, etc. are simply different transformations of the data – not creative acts. If we start copyrighting the labelling scheme in a molecule, then we might as well go home.
The copyright here does NOT protect the IPR of the creator. It protects the business of the publisher who has mandated the creator to hand over their work as payment for publication. The work presented was NOT created by the publisher, it was simply owned by the publisher.
  1. Collections of data have also been granted copyright protection, for better or worse we can argue. But current US law, and I believe also in the EU, protects collections that are not obvious. So for example, a directory of phone numbers of London residents is not protected. However curated spectral data, where choices have been made to check for accuracy, meta data added, etc. are protected. Without these protections, publishers, so they argue, would have little incentive to create new databases.
  2. PMR: I am familiar with publisher arguments. My arguments are that they are (a) suspect (b) detrimental to chemistry
  3. I was greatly disappointed when the supreme court extended the copyright duration to 70 years. There is great value in the public domain, and the decay of this resource is deplorable. At the same time, copyright does allow for the creator of new materials to be compensated for her work, and that is worth protecting too.
  4. Steven Bachrach
PMR: I’m afarid we have to differ. The creator is NOT compensated for their work. Steve is effectively saying that for better or worse we need to look to the publishing community to protect and manage the discipline’s IPR. I take the opposite view.
But I’m sad in that I am to work so hard to make these points and so far getting relatively little support.
I’m off to Berlin-5 next week and will look to see if I can get some more general support for liberating scholarly IP.
 
Posted in open issues | 2 Comments

Webcast: the power of the eThesis

I am very grateful to Caltech, specially Eric van der Velde, for organising and recording my presentation on eTheses at Caltech last month. See The power of the Scientific eThesis, a combined audio, video and screenshow. Caltech have done a very good job of stitching it together. Many of the “slides” were in scrolling HTML so the slide-count is artificially high – each scroll generates a new “slide”. Total time about 67 minutes.
The themes include:

  • homage to Caltech: Jack Dunitz, Linus Pauling, Verner Schomaker and Ken Trueblood.
  • data-driven science in crystallography – examples from 1973 to present day.
  • semantic web and chemistry, including DBPedia
  • Open Access
  • eTheses
  • crystalEye

and questions at the end.
Since my presentations are taken from many thousand slides it gives an accurate impression of a typical talk, where I do not know in advance exactly what components I shall touch on. In a few places my machine ran slowly so there are minor hiatuses.

Posted in chemistry, etd2007, open issues, theses, XML | 3 Comments

An Open Letter to the British Library: charges for Open Access and restricted dissemination of Out-of-copyright material

An open letter to the British Library Board about lack of Open Access and restrictions on out-of-copyright works
Dear Andy Stephens BSc (Head of Corporate Planning and Secretariat, The British Library) andy.stephens@bl.uk
I am a chemist at the University of Cambridge with a major research interest in eScience – the UK term for the combination of scientific research and scholarship with the new opportunities and power of the Grid and cyberinfrastructure. The UK, through the DTI and others, has spent many hundreds of millions of pounds and I am funded in this area by EPSRC, DTI, comapnies such as Unilever and also by JISC (the Joint Infrastructure Systems Committee). Earlier this year, for example, I was invited to a joint international JISC/NSF meeting on “data-driven science” to discuss the future of cyberinfrastructure over the next 8 years. At that meeting a central theme was the universal availability of digital information without physical, semantic, financial or legal barriers.
In chemistry much of the primary scientific information first appears in peer-reviewed publications in a form unsatisfactory for eScholarship. The primary barriers are business/legal in that publishers assert ownership over the content and the piecewise requirement to negotiate licences and other permissions. In practice, therefore, the material is limited to non-copyrightable material (data or out-of-copyright) or to Open Access publications where the licence asserts the right of the user (human or robot) to use the material without permission.
I was therefore dismayed to find recently that the British Library not only fails to recognise and promote access to this uncopyrightable or Open material, but also adds additional financial and legal restrictions on access (see http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=571 and http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=543 and several other posts). In summary the BL (a) charges for Open Access articles and (b) puts draconian restrictions on access to and redissemination of material long out of copyright. I do not know how widespread this is, but I surmise that there is a blanket restriction on effective access that applies to all content. If true, this means that the British Library, which is involved in digital research (http://www.bl.uk/about/strategic/digiresenv.html) and understands the potential is, paradoxically, a major barrier to eScience.
eScience is at the stage where it needs large amounts of semantic high-quality electronic content with zero-access barriers. Traditional operation of copyright and permissions stands in the way and I would ask the BL to take a national and international lead in tackling this problem rather than amplifying it.
[NOTE: Peter Suber, the leading Open Access commentator and expert has said publicly that he was not able to understand the BL’s position on payment and access permissions for Open Access journals – and presumably for out-of-copyright material as well]

Posted in open issues | Leave a comment

The British Unlibrary

I am now gobsmacked. Earlier I have recounted (OUP wants me to pay for my own Open Access article) how I was expected to pay for access to my [*] own Open Access paper both through the actual publisher and an aggregator. (These organizations have admitted that this was inappropariate and are changing their technology to support free Open Access as opposed to charging for it). However, arising from concerns about access to out-of-copyright material through the British Library, I raised this here and Peter Suber commented (More access barriers at the BL document delivery service) that the British Library charge for Open Access material.

PS Comment. See the BL response when the PLoS Director of Publishing, Mark Patterson, asked why the BL was charging for copies of PLoS articles, which are all OA. At first I thought the BL was saying, in effect, that it doesn’t have the resources to see whether an article is under an open license (or in the public domain). But it’s more complicated than that, and the more I re-read it, the less I understand it. In the case of PLoS articles, the BL charges a copyright fee set by UKCLA and passes the fee on to UKCLA, keeping nothing for itself. But it doesn’t explain why UKCLA believes that PLoS articles should carry copyright fees

PMR: If Peter Suber can’t understand it, then no-one can. So any futire answer will have to be in simpler language
Well, perhaps that was a mistake and the have put it right. So I thought I would see what they have done with my article. This is told in pictures. I search for electronic journals and find:
britlib0.PNG
Most of the collections are only available from the St. Pancras Reading Rooms, owing to licensing restrictions, and cannot be accessed offsite. Some collections are freely available on the World Wide Web.”
What on earth is happening here? To view an electronic journal you have to visit the British Library in London? I am lost for words… how can this be part of the UK’s effort in eScience where we have developed tools that can access knowledge across the world’s continents. Perhaps it’s an error, so I search for the OUP issue and find:

So if I want to use an electronic journal I have to travel to the British Library at St Pancras??
Maybe they have another service…
Yes: British Library Direct – search and order journal articles online
and here is my paper:
britlib.PNG
They are both Open Access. I go for the first which is an Open Access Journal , i.e. all articles are OA so any software swicth can simply be put on the journal. I find:
britlib1.PNG
So I have to pay 20.65 GBP (*1.175 for VAT) => 24 GBP. We have paid the publisher to make this article freely available to the whole world. This is so they do not have to ask permission, do not have to pay, can re-use the material for any non-commercial purpose, etc. So firstly the British Library appears to be breaking the terms of our licence – they are charging for something we authors have paid for to be free.
I am going to write to the BL to ask what is going on. And if I don’t get an answer I can understand I’ll take it to a knowledgeable Member of Parliament. There is a lot of interest in Open Access to funded information among some politicians.
In simple terms this is destroying eScience. eScience is only possible with zero-barrier to access. ZERO. This is worse than the cases I have had before because this is the National Library in the UK. I have, on occasions, parised the BL. But here it is saying that it is more important to put barriers in place than to enable freedom of access.
This culture permeates the library community in Britain. They are terrified of breaking copyright. Even when there isn’t any. I can’t get Open theses because they can’t reach (dead) copyright holders. I can’t get papers written pre 1900 because we can’t get definitive copyright clearance.
I know the BL is a government-funded institution and can’t break laws, but it really should be pushing for the cobwebs to be swept away, not locking the door and letting more grow.
=================================================================
[*] This is a multi-author paper but I am using the singular as I have not discussed this issue with the other authors.

Posted in open issues | Leave a comment

Copyfraud

I have just discovered (through Klaus Graf and Peter Suber) the word that I need to describe what Wiley, eMolecules and Ingenta are doing:
COPYFRAUD
Read the excellent paper

Falsely claiming copyright to a work in the public-domainJason Mazzone, Copyfraud, Brooklyn Law School, Legal Studies Paper No. 40, August 21, 2005. (Peter Suber: Thanks to Klaus Graf.)

Abstract: Copyright in a work now lasts for seventy years after the death of the author. Critics contend that this period is too prolonged, it stifles creativity, and it undermines the existence of a robust public domain. Whatever the merits of this critique of copyright law, it overlooks a more pervasive and serious problem: copyfraud. Copyfraud refers to falsely claiming a copyright to a public domain work. Copyfraud is everywhere. False copyright notices appear on modern reprints of Shakespeare’s plays, Beethoven piano scores, greeting card versions of Monet’s water lilies, and even the U.S. Constitution. Archives claim blanket copyright to everything in their collections. Vendors of microfilmed versions of historical newspapers assert copyright ownership. These false copyright claims, which are often accompanied by threatened litigation for reproducing a work without the “owner’s” permission, result in users seeking licenses and paying fees to reproduce works that are free for everyone to use. Copyfraud also refers to interference with fair uses of copyrighted works. By leveraging the vague fair use standards contained in the Copyright Act and attendant case law, and by threatening litigation, publishers deter legitimate reproduction of copyrighted works, improperly insisting on licenses and payment of fees. Publishers wrongly contend that nobody may reproduce for any reason any portion of a copyrighted work, without the publisher’s prior approval. These circumstances have produced fraud on an untold scale, with millions of works in the public domain deemed copyrighted, and countless dollars paid out every year in licensing fees to make copies that could be made for free. Copyfraud stifles valid forms of reproduction and undermines free speech. Copyfraud also weakens legitimate intellectual property rights. Congress should amend the Copyright Act to allow private parties to bring civil causes of action for false copyright claims, and to specify as a statutory matter that copying less than five percent of a single copyrighted work is presumptively fair use. In addition, Congress should enhance more generally protection for the public domain, with the creation of a national registry listing public domain works, a symbol to designate those works, and a federal agency charged with securing and promoting the public domain. Failing a congressional response, there may also exist remedies under state law and through the efforts of private parties.

(Peter Suber (2005): At a conference last year, I proposed civil damages for infringing the public’s right to use the public domain, and I’m very glad to see a law professor take up the idea in all seriousness.)

Copyfraud is very simple to understand. You take a document in the public domain, and put your copyright on it. It’s a simple, virtually safe way of making money you aren’t entitled to. Mozzone describes how it is perpetrated on classic works of literature, art, music, film, etc. He doesn’t touch on scientific facts but I see no reason why this doesn’t fall into the same category.
[Technically the Ingenta action of adding their copyright to our abstract is different in that we are already the copyright holders. But I think Copyfraud describes it sufficiently.]
The problem, as he makes clear, is that the balance is asymmetric. Copyright holders have powerful lawyers and Wiley can pursue someone for reproducing a scientific graph with 10 data points. The public domain, however, has no such force. I would claim that a graph of scientific data is in the public domain, as are pictures of molecules, their spectra – certainly not the property of the publisher. Yet the public domain – as Mozzone notes – has no symbol corresponding to the © symbol, no legal protection.
Where authors understand the problem and want to protect the freedom of data they can use a licence – CC, or OKFN. But it shouldn’t be necessary. Data should be free.

Posted in data, open issues | 4 Comments