Open Data is necessary but not sufficient

Dictated and Scraped into Arcturus

John Wilbanks is Director of Science Commons and a co-author of the Panton Principles. He has responded to my concerns about access to climate change data, with the observation that Open data is not the major problem or solution. I’ll comment at the bottom. I agree with what he says, but I will argue why there is a role for Open Knowledge in this issue.

We’ve spent a lot of time on climate change and open science at Creative Commons. I have a personal interest, as my father is a climate change researcher and was an author on the most recent IPCC report. He and I co-wrote a paper on open innovation in sustainable development earlier this year which was OA, and the references for that paper are a good start for the non-data side of the problem. It’s at http://www.mdpi.com/2071-1050/2/4/993/

 

In most cases in climate change science, impacts, and adaptive responses, the hurdles for open science are not intellectual property rights but scientific practices related to confidentiality and protecting one’s own data and models – a different challenge. The current evaluation of iPCC being done by the Interacademy Council at the request of the UN is beginning to take a look at how such conventional scientific practices can become a threat to the perceived integrity of science. IP is a footnote in the debate, unlike in OA or in free software or in free culture. Our successes in these spaces have sadly conditioned us to look at “free” legal tools as our hammers, and see the world as a bunch of nails. It’s a great irony actually. 


In the case of climate change mitigation, of course, the open science issues are similar to those in other areas of traditional manufactured technology – accentuated by the fact that the main drivers of increases in global GHG emissions are now in the larger developing countries, while the industrialized countries still control a lot of the intellectual property for addressing that problem….

 

In many ways the “open” debate about data fails to capture the reality of these issues. Making data open, even fully compliant with the Science Commons protocol, is actually far from enough. I hope that we can make these debates nuanced enough that we don’t push “open” as the end game, because I can comply with the protocol, or with Panton, and still have my data be worthless from a scientific perspective. An extreme example would be that I publish PDFs of my data under PDDL, and claim the mantle of “open”. If we as a community push “open” as the goal, and not “useful” as the goal, then we enable that outcome. 

 

Open climate science, at least as it regards data, is almost never an intellectual property problem. It’s a culture problem, it’s a technology problem (formats, ontologies, standards), and it’s a language problem. It’s a political problem, it’s an incentive problem. Getting rid of the IP is no more than table stakes. And if we don’t deal with the inventions – the technologies that both create climate problems and that promise to mitigate them in adaptation – then we won’t be changing the world the way we want. That’s a big part of why our science work has shifted to focusing significantly on patent licensing and materials transfer…

I completely agree that this is a culture problem. It was the culture of priesthood that hit me – unexpectedly and repeatedly – at the RI meeting. And I do not argue that IP issues are the primary problem. But I wouldn’t call Open Data simply an IP problem. Lack of Open Data is symptomatic of a deeper malaise. And open Data is catalytic – if people are accustomed to making their data Open they are more likely to make their processes Open. A group that produces Open Data has to think about openness every time they release a data set, every time they publish a paper.

Perhaps an analogy would be laboratory practice. Running a safe and clean laboratory does not in itself make a good scientist. But it emphasizes certain fundamental principles and attitudes such as consideration for co-workers, having procedures in place, adopting discipline.

I’d describe Open Data as a necessary but nowhere near sufficient condition. But it’s also a visible and valuable touchstone. I’ll address this in the next post.

 

Posted in Uncategorized | Leave a comment

The Open Geospatial Consortium

Typed and Scraped into Arcturus

When I started to blog and mail about Climate Change/Research I knew I was blundering into areas that I knew little about and that I would discover a great deal of previous and current activity. I ahve a wonderful response from Lance McKee of the Open Geospatial Consortium (OGC) [on the OKF open-science mailing list]

I call your attention to one activity of the Open Geospatial Consortium (OGC): the GEOSS Architecture Implementation Pilot 3 (AIP-3) data sharing activity: http://sites.google.com/a/aip3.ogcnetwork.net/home/home/aip-3-kickoff/data-sharing-guidelines .

There are many in the OGC (http://www.opengeospatial.org) who share your concerns about climate data. OGC runs a consensus process in which government and private sector organizations collaborate to develop open interfaces and encodings that enable, among other things, sharing of geospatial data, including climate data. I think the OGC is likely to play an important role in the opening up of climate science.

I invite you to look through a presentation in which I gathered my learnings and musings about the importance, feasibility and inevitability of persistent and open publishing of scientific geospatial data: http  http://portal.opengeospatial.org/files/?artifact_id=37254 .

 

The presentation is well worth reading, including 17 (sic) reasons why data should be open.

It is very valuable to see that the OGC has done so much. I will read what emerges over the next days. It may be that the OKF has a role – it may be that it should be primarily supportive of others.

I have an open mind.

Posted in Uncategorized | Leave a comment

Open Climate Data: I cannot find the Spectrum of Carbon Dioxide without violating Copyright

Typed and Scraped into Arcturus

Here’s an excellent example of the issues in Open Data. A simple, important, question from David Jones (who is involved in climate research infrastructure). It’s in response to my last post on Open data in Climate Research and it’s an excellent tutorial on the issues. And the result is very depressing.

David Jones says:

June 15, 2010 at 12:42 pm  (Edit)

Perhaps you could kick off with what data you would like to see open.

Data that I would like, that you might have a professional opinion on, is a reference library for the IR spectra of the Kyoto Protocol gasses (CO2 and other greenhouse gasses). I had a look, but I couldn’t find an open archive of IR spectra. Do you know if one exists?

To remind readers – infrared absorption is the reason than greenhouse gases heat up the planet – they absorb infrared radiation and turn it into heat. The heat is trapped in the atmosphere. CO2 is an important greenhouse gas so it’s infrared absorption is a key piece of data. Ideally we need physical and chemical properties for all atmospheric components.

I can probably find the spectrum in an undergraduate textbook. If I copied it I would be sued by the publisher and burn in copyright hell. Yes, it’s factual data, and yes it’s important for the future of the planet and yes the publisher simply copied it from the author but copyright is the supreme god and we must worship it. So simply copying known public information from copyright-holders is a legal no-no.

I go to the web, Google for “collections of infrared spectra” and get:

http://www.wiley-vch.de/stmdata/pdf/Infrared_Price_List.pdf

(I have copied this without permission. Wiley is an aggressive publisher who pursued a graduate student, Shelley Batts, for critical blogging a single graph from one of “their” papers. They said it was a mistake and everything was now OK. It’s OK in that copyright still rests completely with Wiley.)

Anyway, we digress. This shows that a SINGLE BOOK of spectra for a SINGLE USER can cost 3000 Euros (that’s about 4000 USD). That shows the scale of problem we face in chemistry. Now I agree that these spectra were won with the sweat-of-the-brow and so on but in these days of automatic machines it does not cost 2 USD to publish a copy of a spectrum. This is an example of monopoly and scarcity control and inflated prices. (It may well be that Hummel does something laudable with the money – I have no idea).

The message is not only that the data are not Open they are enormously expensive.

Let’s try another: http://www.spectraonline.com/

First read the conditions (I have highlighted parts):

Use of Site. Thermo Fisher authorizes you to view, print and download the materials at this Web site (“Site”) only for your personal, non-commercial use, provided that you retain all copyright and other proprietary notices contained in the original materials on any copies of the materials downloaded or printed from the Site. You may not modify the materials at this Site in any way or reproduce or publicly display, perform, or distribute or otherwise use them for any public or commercial purpose. For purposes of these Terms, any use of these materials on any other Web site or networked computer environment for any purpose is prohibited. The materials at this Site are copyrighted and any unauthorized use of any materials at this Site may violate copyright, trademark, and other laws. You agree that you will not disclose, republish, reproduce, or distribute any of the information displayed on or comprising this Site (the “Content”) or make any use of the Content that would allow a third party to have access to the Content. If you breach any of these Terms, your authorization to use this Site automatically terminates and you must immediately destroy any downloaded or printed materials.

Not exactly cuddly. Where’s the data? They say:

The Spectra Online database is a collection of public domain and other data generously contributed from various sources. Please note that Thermo Electron Corporation does not control the reliability or quality of contributions to the Spectra Online database and therefore makes no guarantees or warranties on the usefulness or correctness of the information or data contained therein. Below are links to descriptions of current Spectra Online data collections:
Acorn NMR NUTS DB Searchable Archive
American Academy of Forensic Sciences (AAFS) MSDC Database Agilent MS of VOC’s Library
Boeing Aerospace FT-IR of Lubricants
Caltech Mineral Spectroscopy Server
CCRC Database – GC-EIMS of Partially Methylated Alditol Acetates
EPA Vapor Phase FTIR Library
EPA-AECD Gas Phase FTIR Database of HAPs
FBI FT-IR Fibers Library (Spectrochimica Acta)
David Hopkins NIR Collection
InPhotonics Raman Forensics Library
IUCr CPD Quantitative Phase XRD Round Robin Test Set
Jobin Yvon Raman Spectra of Polymers
LabSphere FT-IR and NIR Spectral Reflectance of Materials
McCreery Raman Library
NIST Chemistry WebBook
Notre Dame Organics Workbook Spectra
Edward Orton FTIR of Solid Phase Synthesis Resins
OMLC – PhotchemCAD Spectra
Pacific Lutheran University – NMR Spectra for Solomons and Fryhle Organic Chemistry, 7th Ed.
Pacific Lutheran University – FTNMR FID Archive
PhotoMetrics Inc. FT-IR Library
RMIT Applied Chemistry MS Library
SPECARB Raman Spectra of Carbohydrates
David Sullivan FT-IR Collection (University of Texas)
TIAFT User Contributed Collection of EI Mass Spectra
UCL Raman Spectroscopic Library of Natural and Synthetic Pigments
Univ. of Northern Colorado – Protein Infrared Database
University of S.C-Aiken UV-Vis of Dyes
USDA Instrumentation Research Lab NIR Library
U.S.G.S. Spectral Library of Minerals
University of the West Indies, Mona JCAMP Archive
Widener University – Dr. Van Bramer’s Spectral Archive

So what we have here is theft from the public domain. A variety of public sources have donated data to Thermo which has stamped them all with such a restrictive contract that I cannot even show you one spectrum. It is extraordinarily easy to steal from the public domain. Just wrap it in frightening legal stuff.

Now you could argue that actually I can take data from this site as it was originally public domain. But not all of it is. And if I am a robot I have no way of deciding which. I read the terrifying legal conditions and my system stackdumps.

This pollution and theft is endemic. We have to Open the Data.

Let’s try a US government site – NIST. It has an excellent set of chemical data – probably the best in the world. Here’s its excellent Webbook with a spectrum (http://webbook.nist.gov/cgi/cbook.cgi?ID=C124389&Units=SI&Type=IR-SPEC&Index=1#IR-SPEC )


 

And NIST is a US Government organization so all its works are ipso facto in the Public Domain, right? And so we can publish an Open Collection of Spectra by copying from NIST?

NO: http://www.nist.gov/data/PublicLaw.htm says

Standard Reference Databases are copyrighted by the U.S. Secretary of Commerce on behalf of the United States of America.  All rights reserved.  No part of our database may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior permission.

Source: Public Law 90-396, July 11, 1968, The Standard Reference Data Act

Purpose: To provide for the collection, compilation, critical evaluation, publication and sale of standard reference data

Excerpt:

Section 6 – …the Secretary may secure copyright and renewal thereof on behalf of the United States as author or proprietor in all or any part of any standard reference data which he prepares or makes available under this Act, and may authorize the reproduction and publication thereof by others.

[The US Gov. Has made an exception for NIST so it can collect money.] So I shall probably go to Guantanamo for publishing the spectrum. I’ll take the risk, but I clearly cannot copy the whole lot.

So, very simply, although some 20 million chemical compounds are known there are no collections of Open infrared spectra.

As a responsible member of the Open Knowledge Foundation I am not prepared to appropriate material that has been “copyrighted” by others. So my conclusion is:

IT IS NOT POSSIBLE TO FIND AN INFRARED SPECTRUM OF CARBON DIOXIDE – A CRITICAL GREENHOUSE GAS – WITHOUT POTENTIALLY VIOLATING COPYRIGHT.

I hope this statement is wrong.


 

Posted in Uncategorized | 3 Comments

Open Data in Climate Research?

Dictated and Scraped into Arcturus

Yesterday evening I went to a discussion at the Royal Institution. I’ll first give the abstract of the occasion and then my motivation and conclusions. Please read what I write very carefully, because I am not commenting on the primary science – I am commenting on how the science and its conclusions are, or are not, communicated.

The Climate Files; The battle for the truth about global warming

 

In November 2009 it emerged that thousands of documents and emails had been stolen from one of the top climate science centres in the world [PMR: The Climate Research Unit (CRU) at the University of East Anglia (UEA), UK] . The emails appeared to reveal that scientists had twisted research in order to strengthen the case for global warming. With the UN’s climate summit in Copenhagen just days away, the hack could not have happened at a worse time for climate researchers or at a better time for those who reject the scientific consensus on global warming. Yet although the emails sparked a media frenzy, the fact is that just about everything you have heard and read about the University of East Anglia emails is wrong. They are not, as some have claimed, the smoking gun for some great global warming hoax. They do not reveal a sinister conspiracy by scientists to fabricate global warming data.

To coincide with the launch of his new book, The Climate Files, the veteran environment journalist Fred Pearce discusses how the emails raise deeply disturbing questions about the way climate science is conducted, about researchers’ preparedness to block access to climate data and downplay flaws in their research.

This will then be followed by a panel involving Dr Myles Allen (University of Oxford) and Dr Adam Corner (Cardiff University).

Fred Pearce was the main speaker and described in detail his analysis of the emails which had been exposed from UEA. I would agree from his analysis that there is no “smoking gun” and that many of the emails were unfortunate rather than malicious. He was then answered by Drs. Allen and Corner, and there was clearly some disagreement between them and him. The discussion was then opened to the audience (which included scientists, journalists and many others) and a lively and valuable debate took place.

I should make it clear that I am making no comment at the moment as to whether global warming is a reality and if so how important it is. And I am deliberately taking the position of an agnostic because I want to find for myself what the evidence is and how compelling it is. For that, it is important that the information is Open and so it is as a “data libertarian” (a useful phrase which I heard last night) that I attended the meeting.

As a result of the presentations and the discussions within the panel it seemed to me that there was a serious lack of Openness in the Climate Research community. It is important not to judge from just one meeting but given the enormous public reporting and discussion I was disappointed to find that there were still parochial and entrenched attitudes about ownership and use of data.

My superficial analysis is that the CR community has retreated into defensive mode and has not changed its communication methods or interaction with the community. This is perhaps understandable given the hostility and publicity of much of the media coverage and further comment (and UEA has put a ban on staff speaking on the issue). Such bans can recoil, as it is then easier to believe there is something to hide. It may be difficult, but it seems essential to radically overhaul the governance and communication.

On more than one occasion the panel asserted that Climate data should only be analysed by experts and that releasing it more generally would lead to serious misinterpretations. It was also clear that on occasions data and been requested and refused. The reason appeared to be that these requests were not from established climate “experts”. This had led to the Freedom Of Information Act (FOI) being used to request Scientific Data from the unit. This had reached such a degree of polarisation that of over 100 requests only 10 had resulted in information being released by the University. I had no idea that this “FOI battle” had been going on for several years and that nothing had been done to try to solve the problem. This in itself should have been a signal that change was necessary – however inconvenient.

We should remember that climate research is not an obscure area or of science but something on which governments make major and lasting decisions. It surprised me that there was not an innate culture of making the data and research generally available. The CRU is effectively a publicly funded body (as far as I know there is minimal industrial funding) and I believe there is a natural moral, ethical and political imperative to make the results widely available. The FOI requests should have been seen as a symptom of the problem of not making data available rather than as, it appears, being regarded as irritation from outsiders. Whatever the rights and wrongs, it was a situation with a high probability of ending in public disaster (as it did).

I was sufficiently concerned that I spoke at the end and although I do not have my exact words I said something like the following:

“I am a Chemist and a data libertarian. I am not an expert in climate change but I believe that I could understand and contribute to some parts of climate research (e.g. data analysis and computational science and I do not accept the need for a priesthood. In my advocacy for publishing Open Data I encounter many fields where scientists and publishers are actively working to make data openly available. The pioneers of genome research and structural biology fought their culture (which included major commercial interests) to ensure that the results of the work was universally available. I see other areas where scientific papers cannot now be published unless the scientists also make their data available at time of publication. Climate research appears to have generated a priesthood which controls the release of information. For a science with global implications this is not acceptable.”

This will not be my last blog post on this issue. I was sparked into action when I heard a talk in Cambridge by Nigel Lawson (Margaret Thatcher’s Chancellor of the Exchequer). Lawson argued (using proof by political assertion) that climate change research was a conspiracy. He has now set up a foundation to challenge the mainstream view (The Global Warming Policy Foundation). However I realized while listening to him that I did not have compelling incontrovertible Scientific Data and arguments that I could use to challenge his views. This is an untenable position for a scientist and so I believe I must educate myself and my fellow scientists about which pieces of information are genuine.

To do this we have to develop a culture of Openness and a number of us discussed the problem at the Open Knowledge Foundation’s OKCon earlier this year. Although much has been written and continues to be written on climate research there is no Open repository of information.

The OKF’s goal is to create or expose Open resources. We are currently thinking about how to do this for climate research. We have to be extremely careful that we do not “take sides” and that our role is strictly limited to identification of Open resources.

Posted in Uncategorized | 8 Comments

Data-intensive Science: The JISC I2S2 project.

Typed into Arcturus

I’m in Bath for a JISC meeting – the I2S2 meeting. All JISC meetings have acronyms – I2S2 stands for Infrastructure for Integration in Structural Sciences and involves a number of experimentalists in finding the structure of materials (More on the I2S2 project at its website: http://www.ukoln.ac.uk/projects/I2S2/) . For example Martin Dove (from Earth Sciences in Cambridge) is looking at how atoms in silicates move, and how this changes the structure of minerals. Since much of the Earth’s crust is made of silicates this is of importance in understanding tectonic movements, exploration for minerals, etc.
Here’s an example of the multidisciplinary nature of science – to find out what happens hundreds of kilometres (10^5 meters) deep in the earth we have to understand how atoms behave at the picometer scale (10^-12 meters). So there is a factor of nearly 20 powers of ten – and it’s remarkable how often the very small and the very large interact.
Martin collaborates with Rutherford laboratory near Harwell run by STFC. Martin uses neutrons to determine how the atoms move and needs a special “facility” (ISIS) to do this. Here (http://www.isis.stfc.ac.uk/instruments/instruments2105.html ) are some of the many projects at ISIS which include wsays of improving mobiles phones, mediacl diagnostics and much more. Science underpins our modern life and however we are to escape from our present plight we must see science at the centre. It’s something that the rest of the world admires in the UK.
ISIS produces DATA. And that’s what the I2S2 project is about. The data is expensive to produce (neutrons are not cheap) and the data are complex. STFC also has a large resource in developing new approaches to information and Brian Mathews from STFC is therefore also on the project.
This is “large science”. But I2S2 also covers “long tail” science – where lots of science is done by individuals. Simon Coles runs the National Crystallographic Service in Southampton where hundreds of researchers submit their samples and his group “solves the structure” and returns the data. Here the data are likely to be in hundreds of separate packages.
What’s characteristic of these projects is that the data often drive the science. So managing the data is critical. And we’ve just been talking about problems of scale. If we get 10 times more data then the problem becomes intrinsically more difficult – it’s not just “buying another disc”. New bugs arise and integration issues become essential.
So I2S2 is looking to see whether there can be a unified approach to managing data. This requires an information model, because only when we understand the model can we create the software and glueware to automate the process. This is not easy even when “most of the experiments are similar”. It needs expert understanding of the domain and a vocabulary (more technically an ontology) for the data and the processes. Moreover it’s not a static process – we often keep refining the processes in transforming and managing data.
And the result of experiment A is often the input for project B. So the process is often shown as cyclic – the research cycle. A key concept id “data reuse” – in this area ideas often build on existing data (which is why I and others keep banging on about publishing data). Here’s a (relatively simple!) diagram for the research cycle in I2S2:

Note the cycle round the outside. Start at the NE corner. Not everyone maps their research in precisely these terms but most do something fairly similar. The data-intensive part is mostly at the bottom. Data are not simple – usually the “raw” data need processing before being interpreted. For example an experiment may collect data as photons (flashes of radiation) and these need integrating locally. Or they need transformation between different mathematically domains (“Fourier transform”). Or they are raw numbers from computer simulations. It’s critical that any transformation is openly inspectable so that the rest of the world does not suspect the authors of “manipulating their data to fit the theories”. That’s one reason why it’s so important to agree on the data transformation process and that anyone (not just scientists) can agree it has been done responsibly.
This is a microcosm of science – data is everywhere – and all of those projects will be thinking and acting as to how their data can be reliably and automatically processed. Because automation gives both reproducibility and also saves costs.
So when scientists say they need resources for processing data, trust us – they do.

Posted in Uncategorized | 1 Comment

Reclaiming our Scholarship: tribute to Vitek Tracz and BMC.

Typed (because I am an the BL!) into Arcturus

It was a wonderful occasion at BMC’s 10th birthday last night in the Gherkin (London’s iconic modern building). The party was on the very top – a circular pad with stunning views of London (and its cloudscape – see next blog).

Several things have come of age. BMC is the vision of Vitek Tracz, a remarkable, and very engaging, entrepreneur whom I first met in the late ’90s in Current Science in Cleveland Street. Vitek has an incredible knack of spotting new markets and I have followed him though his successful development of new ventures in scholarly publishing.

I could not possibly do justice to Vitek, but luckily Richard Poynder has published a very long and insightful interview (http://poynder.blogspot.com/2006/05/interview-with-vitek-tracz.html ; note that this is 4 years old, and Vitek does not stay still, and many of the concerns in 2006 have passed. Note also that Vitek has sold BMC to Springer (who joked last year that he sold it for considerably less than it was worth )).

What I have only just realised is how important Vitek has been in making connections and making things happen. For example he was the force behind persuading Ian Gibson to chair the Select Committee that interrogated the pyblishers in the last decade. And what he revealed (apparently for the first time) last night was that he and David Lipman (http://en.wikipedia.org/wiki/David_J._Lipman) had jointly cooked up the complementarity of PubMedCentral and BiomedCentral.

It has always been clear that scientific scholarly publishing (with somewhere of the order of 10 billion USD/year) needs a strong business basis. It can’t be run for free. And it can’t be run on the basis of Green Open Access. It needs champions who make things happen in the real world through money, contracts, products, etc. (I know some of us try to change the world through the power of ideas, and sometimes, very rarely it works, but most change is made sustainable through real-life institutions with paid staff).

Without Vitek and BMC we would not have Open Access. That’s a strong statement, but I think it’s justified. Any transition in academic practice (and I am constantly frustrated by how conservative academia is) is painful and costs money. (It’s getting worse, particularly with the horrible metrics and bureaucracy everywhere). What BMC has shown is that OA can be sustainable. And that sustainability is in the hardest part – the start of the ramp-up in the transition. Because at this stage academia (and their funders) have to pay twice – still for the reader-pays subscriptions (OA has not lessened the need for paying those) and now for the author/funder-pays for OA.

And there is a less formal, but incredibly tangible, realisation that OA publishers are “on our side” whereas, increasingly, closed access publishers are not. We should not be sentimental – BMC is a business and has to make money and I have heard academics complaining about the cost of OA fees when there is no Wellcome funding. But that is the essence of the market – we’ll see competition and we’ll see realistic prices.

So today, as I sit in the public wifi gallery of the BL, I am feeling good about the future. Which takes some effort in this country!

 

 

Posted in Uncategorized | 1 Comment

Reclaiming Our Scholarship (thanks to BMC): Motive, Means and Resource

Dictated into Arcturus

Today I, and the other authors of the Panton Principles, have been invited by Biomed Central to hand out the prizes which they have sponsored (with Microsoft) for the best papers that make data Open. As part of this they have asked us to give short presentations (no more than 5 minutes) and I thought I would try to get some of my points down in a blog post before the event.

I have talked to a lot of people about how to make data Open and have a first-pass analysis of the things that have come together. Following the traditional analysis of a murder (motive, means, opportunity) I have called them motivation, means and resources.

  • Motivation. Unless scientists and other people and organizations are motivated to make data open then it is unlikely to happen. The motivations are often complex and can probably be divided into carrots and sticks. The carrots include altruism (which can often be beneficial to the altruist), formal recognition and a payback from other contributors (in terms of useful data). In many cases however this has features of a prisoner’s dilemma. On the stick side there is an increasing requirement as part of the funding and employment of scientists that they make their data available. Sometimes this is merely an irritation in that the scientists are already convinced that they should contribute data but need a prod, but sometimes it is a major intrusive and unwelcome aspect of their current practice. I shall deal with this in future posts.
  • Means. Unless the tools and protocols are in place for sharing data the barriers are formidable. This is often referred to as “data formats” but is usually more complex and involves ontologies, semantics, and data selection. Unless scientists are given clear guidelines and tools that are agreed by the community it is hard for them to share data.
  • Resources. Although Data storage is very cheap, it is not zero-cost and unless there are clear places all methods for making it available and for making it at least partially persistent it is again unreasonable to expect scientists to deposit data. There probably is a large amount of freely available storage provided by academia and other institutions, but very often this is heterogeneous and hotchpotch. The simple assumption that all data can be put in institutional repositories is very unlikely to succeed in most cases.

How are we going to provide all of these three components? We already have many examples where data sharing is accepted so we can draw some generalisations from this. There normally needs to be a well motivated independent trustable authority (often domain-specific) that oversees the process. Examples of this are the Bio informatics centres (NCBI, EBI, PDB, etc.) and similar resources in high energy physics. But in many sciences whether data is heterogeneous and the disciplines are uncoordinated by National or International bodies there is a major problem before data can be captured and shared.

What role can publishers play? As long as the data are small and extremely common it is possible for publishers to man data on their websites that low-cost. A good example of this is in crystallography where CIF (data) files are routinely required as a prerequisite of publication and where they appear alongside the full text. But it is uncommon to see Excel spreadsheets or molecular structures routinely attached to publications. This is not surprising; publishing data costs effort and therefore money and then there is no immediate return for the publisher in doing so.

What about institutional repositories? My concern here is that they have been set up primarily to address the needs of managing full text manuscripts (or even only abstracts) as a result of a variety of political pressures. The most common reason is to manage the research assessment exercises that are now routine in certain countries. Others include advertising the institution (rather than individuals). Again this is seen as a chore by many of the academics. It also means that repository managers take a manuscript- like approach to information rather than a holistic approach to the capture of scholarship. Moreover a typical university will deal with thousands of different types of scholarship and it is unreasonable to expect a repository manager to have any knowledge of all other than an extremely small fraction. For that reason a better solution will probably involve domain-specific repositories and I shall address this in future posts.

Let me finish by congratulating the winners and other honourable mentions of the Open Data pric=zes. We will be discussing today how these awards can be developed so that next year and hopefully in future years this can contribute to the culture of open data publication and make people aware of the three components that need to be addressed. At least it will help with the motivation!

Posted in Uncategorized | Leave a comment

Reclaiming our Scholarship: What should we do when Elsevier crashes?

Dictated into Arcturus

This is the first in a series of posts under the heading “reclaiming our scholarship”. I think we’re at a critical time in scholarly publishing when academia has the opportunity to reclaim the vast digital wealth that it is given away to commercial publishers and from which it has suffered enormous detrimental consequences. I’m not alone in this and will start by highlighting a recent interview by Richard Poynder (http://poynder.blogspot.com/2010/06/reed-elsevier-need-for-progressive.html ).

Richard is a well known commentator on open issues and takes a very measured and valuable approach. (I have just spoken at length with him as part of an interview that he has done with Jean-Claude Bradley).

Here he interviews a senior analyst:

In two recent equity research reports on Reed Elsevier, Claudio Aspesi — an analyst based at the sell-side research firm Sanford Bernstein — argues that the company is “in denial on the magnitude of the issue potentially affecting scientific publishing”, and suggests that it is time to “pursue a progressive break-up of the company”. I [RP] emailed Aspesi to find out more.

I’m not going to give you snippets from this because I want you to read it completely. Essentially Aspesi argues that there is no new money in the system for scholarly publishing and there may well be much less. It is an inexorable consequence that Elsevier will start to crash. I am convinced by these arguments though it does not surprise me that the senior management at Elsevier appears to be in denial.

Assuming that there is a cataclysm in scholarly publishing – and there are so many reasons why this should happen – the question is how academia can take advantage. It has been spectacularly unable to react to the opportunities that the Internet and other new technology give, so why do I have any hope that it will do better this time? Probably only because I’m an incurable optimist and people sometimes learn from their mistakes.

Anyway I shall try to offer some simple homespun the ideas about how academia – and in this terms I include their funders – can change scholarly publishing to their own advantage and come back to a situation where it is done on behalf of the community for the behalf of the community.

Posted in Uncategorized | 1 Comment

12 suggestions for how Librarians can build the future

Typed and scraped

Bethan Ruddock has already commented on this blog and is typical of the relatively few who are prepared to debate the future of libraries in public. It is always very encouraging to see young people speaking their mind – it takes courage. There have been a number on this blog – and in my ambit – Broniba (Jennifer Daniel), Sara Wingate-Gray (the travelling poetry library) and others I met at City University who want to change the world and not accept it as it is. Here’s Bethan – she urges librarians to get out and talk to others. I have resurrected some suggestions for what could be done together..

A newly minted librarian, Bethan joined Mimas in 2008 when she started working with Copac to incorporate specialist libraries. …

Bethan is actively involved in the SLA (Special Libraries Association), and in 2010 has been selected by the SLA committee as a Rising Star of the association, an award that recognises newer members of the association who have made significant contributions.

 

bethan ruddock says:

May 27, 2010 at 6:49 pm  (Edit)

Peter, I agree with you. I think many librarians – myself included – are often wary about taking their professional engagement and opinions outside the profession. This may be due, in some part, to a certain self-deprecation about the profession: there’s a movement in libraries and information provision which says that the user doesn’t care about what we do or where the information comes from, they just care about getting the information they need. For a lot of users I think this is essentially correct. But we shouldn’t take this to mean that we shouldn’t engage with those users who are interested, and we certainly shouldn’t assume that it means that no-one outside the profession cares about what we do. When we face issues – as we frequently do – that affect other groups, we should use our expertise to be advocates. As you suggest, there are a number of methods by which we can do this.

There is no point in us complaining of powerlessness and saying that no-one listens to librarians/information professionals unless we are actually talking to them.

(I’m hoping that my profession will forgive me for these sweeping generalisations, especially as I have included myself among the number that needs to improve. For a number of outstanding examples of what librarians should be doing, see the Library Journals Movers and Shakers (http://stage.libraryjournal.com/MS2010), especially the advocates.)

Yes – librarians are not powerless. Librarians are human beings who outside the library are indistinguishable from the rest of us.

When I talked at Internet Librarian last year (http://www.ustream.tv/recorded/2362193 ) I thought that rather than simply try to motivate change I would give some concrete suggestions. So I came up with 12 ways that librarians could help to change the world probably without going to jail or probably without getting sacked, and maybe even advancing their own position. I thought them up in a train journey – not sure whether they will stand the test of time and I might comment later. Here you get a snippet.

The library can be at a gateway of power and creativity in the electronic world. Be excited about the possibilities. Steve Coast has changed the modern world of maps – by himself and with only 250,000 other unpaid volunteers. Anyone could have done that. The library should be a great place to bounce ideas around.

For some librarians like Sara it’s fun. I introduced Sara to Rufus because of open Shakespeare and she’s got caught up with the Open Knowledge Foundation and put huge effort into it. Anyone can do that.

So the zeroth suggestion is have fun. Think excitedly. Nothing is impossible. Here they are from Nov 2009. The order is random.

Actions that every librar(y|ian) can do

  • Citizen Librarian .

    Engage volunteers to help with the library. There are zillions of things that the Internet generation could do. If they can catalogue Galaxies they can catalogue other information. Relax control and increase your community

  • Post ALL ACADEMIC OUTPUT publicly – IGNORE COPYRIGHT

    The academy creates information. That belongs to the academy and its members, not to third parties. So theses, manuscripts, etc. are under our control. If the whole world decided to do this the third parties would be powerless. Start with the non-controversial stuff. Then move to fuzzy areas and ask for forgiveness not permission. You might start to get academics involved.

     

  • Text-mine everything

    Show how important the content that we have is. It’s ours. Text-mine the theses. And data-mine them. Start to expose what we have, not hide it away. People will help

  • put 2nd year students in charge of developing educational technology and resources

    Since the library in 3 years time will be on a students device and not in a building get them involved. Ask them how to run their information. Because if you don’y Google and Prentice-hall will do it and cut libraries out

  • Actively participate in obtaining science grants

    Academics survive through grants. If you can help an academic get a grant, then they’ll support you to do it again. Much grant-writing (I go through it regularly) is not discipline-dependent but benefits from good style and knowing the intricacies of the funder. No reason why you can’t do this – I’d certainly appreciate it.

  • Actively participate in the scientific publication process

    Same motivation. Help a scientist get more papers out and you will be appreciated.

  • Close the science library building and move to departments

    There is no need for science libraries – they may be nice quiet places to work but there’s nothing special in their design or management. Human librarians should wear white coats and sit next to scientists and becomes authors on their papers.

  • Hand over all purchasing to national rotweiler Purchasing officer

    There is no point in librarians trying to negotiate with the trained salesforce of publishers. They are trained to win. Hand it over nationally. I believe the Brazilians do.

  • Set up a new type of University Press

    One of the biggest missed opportunities of the century. The Universities could have set up new ways of academic publishing. The costs are now much lower. It may not be too late. In five years it will be when Google or its successor runs academic information systems

  • develop their own metrics system (ARGGH!)

    I hate metrics, but if we are going to have them, they could easily be run by the library. All the information flows through the library – why should ISI do this. You’d get open metrics of more believable quality.

  • Publicly campaign for openness

    You can do this. Think of something every day that should be Open. Then think about how you can make it so. Join the Open Knowledge Foundation. There’s nothing in the Panton Principles that you couldn’t have done. Nothing in OKF’s Bibliographica. Or CKAN or …

    Engage with mySociety. Encourage web democracy…

  • Make the library an addictive game

    Some of the great innovations rely on game-based addiction. Nothing wrong with that. Make it fun to use the library – whatever it morphs into. Make it gently competitive. Make it rewarding in the Internet sense.

     

And get out of the library and come at talk to us. If you have read this blog, you can work out where I’ll be at lunchtime.

 

Posted in Uncategorized | 8 Comments

DRM’ed ILLs: FOI request to Russell group Universities

Scraped and typed into Arcturus

This is my FOI request to a number of Russell Group Universities about the introduction by the British Library of DRM for InterLibraryLoans . There are about 20 RG universities and I have already sent an FOI to Cambridge. As always I would value comments and possibly extra questions though it must be on DRM and preferably ILL.

NOTE: This series of blog posts will start to simmer down after this one. Until the replies come in….

 

Dear University of XXX

I am writing under FOI to request documentary evidence relating to the British Library’s introduction of Digital Rights Management (DRM) technology in its delivery of InterLibraryLoans (ILLs) for journals and journal articles in electronic form (e-ILL). The scope of the request is strictly limited to journal articles (“articles”) and not e-Books or other media (e.g. films). In all replies I request specific documentary evidence rather than general principles. Reference material is appended.

DRM raises major issues for libraries as evidenced by the CEO of the British Library, Lynne Brindley (Brindley2006) and the American Library Association [ALA2008]. The ALA state:

providers can prevent the public from use that is non-infringing under copyright law as well as enforce restrictions that extend far beyond those specific rights enumerated in the Copyright Act (or other laws). Thus, DRM changes the fundamental relationship between the creators, publishers, and users, to the detriment of creators, users, and the institutions that serve them.

DRM also appears to break accessibility law and guidelines [RNIB2010].

e-ILLs are generally requested by academics through their University Library and provided by the BL. They are governed by “part 4 of The Copyright (Librarians and Archivists) (Copying of Copyright Material) Regulations 1989 – http://www.opsi.gov.uk/si/si1989/Uksi_19891212_en_1.htm and schedule 2 – http://www.opsi.gov.uk/si/si1989/Uksi_19891212_en_3.htm ” (Copyright1989). The academic has to sign a contract that they will adhere to these conditions. It is my contention that this act, together with the professional standards of academics and the easy identifiability of anyone breaking it, is sufficient to control the unlawful copying of copyright material. It is the introduction of DRM in addition to this process that is the essence of this request (i.e. e-ILL vs e-ILL-DRM).

The introduction of e-ILL- DRM by the BL started about 5 years ago and was specifically introduced by them (i.e. not simply passing through a content provider’s own DRM). The technology was Adobe Digital Editions (ADE). I can find no public record of discussion or governance of this process either in Universities or at the BL. I have sent a FOI request to the BL to ask for their account of any consultation prior to the introduction of DRM, and also any consultation after the event (including problems raised by the Universities and academics). I formally request the following information which should be supported by documents such as minutes of Library Committees or higher governing bodies, internal memos or public pronouncements:

  1. Did the BL consult your University before introducing e-ILL-DRM?
  2. Has your University formally considered the negative effects of DRM on e-ILLs as shown by the ALA’s concerns (“first use”, “pay-per-use”, “time limits” and “fair use”). Please document which of these were considered? Does the University share these concerns and has it made its concern public?
  3. When the BL introduced DRM on ILLs, did your University object or attempt to negotiate less stringent control? Please document such representations?
  4. Do you consider that the use of ILLs restricted by DRM conforms to your Accessibility policy or to legal requirements on accessibility? Please document.
  5. BL2009 speaks of “The decision to add FileOpen [another DRM] … was driven by customer demand”. Please indicate whether you believe yourself to be a customer of the BL and whether you have “demanded DRM”.
  6. Have you at any stage raised concerns about the use of DRM in educational and research material? Has it been discussed at your governing bodies?

I look forward to your reply.

 

 

References

 

[ALA 2008] http://www.ala.org/ala/issuesadvocacy/copyright/digitalrights/index.cfm

DRM: A Brief Introduction

The purpose of DRM technology is to control access to, track and limit uses of digital works. These controls are normally imbedded in the work and accompany it when it is distributed to the consumer. DRM systems are intended to operate after a user has obtained access to the work. It is in this “downstream” control over consumer use of legitimately acquired works that DRM presents serious issues for libraries and users.

DRM technology is new and evolving. Different schemes are being proposed, developed in the laboratory, and experimented with in the marketplace. In general, these technologies are intended to be flexible and to provide a wide range of options to the content provider, but not the user or licensee. Importantly, DRM technology can have profound effects on a wide range of policy issues, including intellectual property, privacy, access to government information, and security. As a consequence, it will be very important for Congress to carefully consider the impacts on many different constituencies, including libraries.

Key concerns for libraries

The principal policy issues for libraries and schools are not derived from DRM technology, itself, but from the business models the content industry chooses to enforce. DRM has uses far beyond simply enforcing traditional and long-standing protections extant in current law. By embedding controls within the product, providers can prevent the public from use that is non-infringing under copyright law as well as enforce restrictions that extend far beyond those specific rights enumerated in the Copyright Act (or other laws). Thus, DRM changes the fundamental relationship between the creators, publishers, and users, to the detriment of creators, users, and the institutions that serve them. DRM, if not carefully balanced, limits the ability of libraries and schools to serve the information needs of their users and their communities in several ways by:

Eliminating the “First sale” doctrine by limiting the secondary transfer of works to others. First sale has been for centuries a bedrock principle governing the balance of rights between consumers and sellers of information products. It is first sale that allows people to share a favorite book or CD with a friend and that creates secondary markets for works. It is first sale that allows libraries to loan lawfully acquired works to the public.

Enforcing a “Pay-per-use” model of information dissemination that, if it becomes the dominant or even sole mode of access, will be contrary to the public purposes of copyright law. It should not be the business of government to favor or enforce any particular business model in the information marketplace, particularly one that raises major issues of equity and potentially severe economic consequences for public institutions.

Enforcing time limits or other limitations of use that prevent preservation and archiving. Many market models of DRM distribution systems envision content that essentially disappears after a specific period of time or number of uses. DRM technologies can also prevent copying content into new formats. Such controls will prevent libraries, historical archives, museums, research institutions, and other cultural institutions from preserving and providing long-term access to the knowledge products of our society. From the days of the Great Library of Alexandria, society has turned to such institutions to preserve its cultural heritage and provide access to it. There is no evidence that alternative organizations currently exist or will form to play that role in the digital pay-per-use
world.

Eliminating “fair use” and other exceptions in Copyright Law that underpin education, criticism, and scholarship. DRM technology can prevent normal uses of works protected by copyright law, such as printing or excising portions for quotation. For libraries and schools to serve their educational, research, and information roles, the public must be able to use works in the full range of ways envisioned by the Copyright Act in its limitations and exceptions.

[Bl2009] http://www.bl.uk/news/2009/pressrelease20091126.html

Responding to customer demand, the British Library, supplying over 1.6m articles every year to researchers all over the world, has added FileOpen to its choice of delivery options via its Document Supply Service. FileOpen’s DRM technology will improve accessibility and extend the reach of the British Library’s vast resources.

The British Library’s Document Supply services have been at the heart of the national and international research community for 50 years, enabling users to exploit a wealth of information for the benefit of research. In the digital age increasing customer demand for electronic delivery, both remotely and immediately, has seen the number of Document Supply users requesting content delivered electronically raise to over 70%.

In an effort to provide customers with greater flexibility when receiving electronic documents, the British Library has teamed up with FileOpen Systems, a leading provider of a Digital Rights Management platform to the information industries. The FileOpen option offers users an alternative to the Library’s existing Adobe Digital Editions platform, enabling them to access copyrighted material in their Adobe PDF Readers without the need to download a new viewing application.

“The decision to add FileOpen to our Document Supply delivery options was driven by customer demand, they wanted a choice of electronic delivery options,” says Barry Smith, Head of Sales and Marketing at the British Library. “Customer feedback from the testing phase was very positive, and we are pleased to announce that we are now recommending FileOpen as our preferred electronic delivery option to all customers.”

“We’re delighted to be working with an organisation with such worldwide prestige as the British Library, and to be providing their customers with a robust, user-friendly solution for secure document access,” said Elizabeth Murphy, Vice President of Sales and Marketing at FileOpen Systems. “FileOpen Systems is the leading provider of DRM for Scientific, Technical, and Medical (STM) information, and continues to innovate to meet the high standards of that market.”

The FileOpen secure delivery option is now available to all Document Supply customers but is currently unavailable via British Library Direct or British Library Direct Plus.

 

 

[Brindley2006]

Quoted in http://www.pcpro.co.uk/news/94639/british-library-shouts-out-against-unfair-drm:

“As one of the world’s great research libraries we are equally mindful of the threat that an overly restrictive, or insufficiently clear, IP framework would pose to future creativity and innovation”

And http://www.pcpro.co.uk/news/88316/uks-top-librarian-calls-for-digital-protection

“Lynne Brindley, chief executive of the British Library, warned that DRM is already having an impact on the traditional exceptions to copyright law that have existed for libraries”.

Read more: British Library shouts out against unfair DRM | News | PC Pro
http://www.pcpro.co.uk/news/94639/british-library-shouts-out-against-unfair-drm#ixzz0ok64SAex

 

[RNIB2010]: http://www.rnib.org.uk/livingwithsightloss/readingwriting/ebooks/Pages/ebook_formats.aspx

“Adobe’s DRM requires you to download your ebook to ebook reading software called Adobe Digital Editions. Adobe Digital Editions (version 1.7.1) currently has no real keyboard access and does not work with access technology. So if you purchase books that are secured with Adobe DRM, you may not be able to read them using Adobe Digital Editions,” [Note: PM-R: Adobe Digital Editions is the DRM employed by the BL from 2005 onwards

Posted in Uncategorized | Leave a comment