petermr's blog

A Scientist and the Web

 

Archive for February, 2011

Open Writing and Scholarly HTML

Monday, February 14th, 2011

I have been struggling to put my thoughts in order about an underprivileged being – the scholarly author. This post is slightly ahead of a well-formed idea but it’s prompted by Peter Suber’s call for a new term for (?collaborative) authoring (http://www.earlham.edu/~peters/fos/newsletter/02-02-11.htm#contest ). Peter is striving for a term describing author-side openness and writes:

I want this new term for several reasons.  For example, some OA resources which originally lacked author-side openness add it later, and I (and they) need a succinct way to describe what they’ve done.  Sometimes I speak about OA to audiences that know wikis better than they know scholarly journals, and I’d like to say that OA articles have reader-side openness but generally don’t have author-side openness.  Likewise, I want a short and non-pejorative way to say that Wikipedia is not the poster-child of the OA movement. 

I have been increasingly concerned about the increasing restrictions and constraints on author freedom. In the Internet age author should feel relieved of its shackles and instead scholarly authors are burdened with unnecessary constraints. Part, but only part, are of their own making – chasing the chimera of publishing prestige rather than following a more natural course of saying what they want to say. (“You can publish there – it’s not got an impact factor yet”).

A typical example today. My co-author – acting as amanuensis – grizzled that she had to reset the references in Harvard format. This is about as rewarding as whitewashing coal. It has only one purpose – to save the publisher work at the expense of the author doing the work. There are, of course, much better ways of doing references/citations – and in Open Bibliography we are hoping to develop some of them. But scholarly publishing is – bizarrely – one of the least innovative activities in the information age.

So we need to give back freedom to authors. I think I’m taking a slightly different tack from Peter but the motivations are broadly the same. When NCSA made HTML – and more importantly HTTP – accessible in 1993 it changed the world. The message was that ordinary people could publish. I could set up a server. A little arcane, but until then I had assumed that servers were only possible for those who bought expensive tools from vendors. NCSA httpd was an instrument of liberation. Anyone – anyone – who had access to port 80 (and that wasn’t difficult in those days) could set up a server.

And tell the world anything.

HTML could be authored in any simple text editor. No special tools. And you didn’t even have to get it right. Broken HTML was rendered as best as possible. For those who didn’t experience 1993/4, Wordsworth captured the spirit in


FRENCH REVOLUTION


AS IT APPEARED TO ENTHUSIASTS AT ITS COMMENCEMENT.

 

     Bliss was it in that dawn to be alive,

But to be young was very heaven!


Not in Utopia, subterranean fields,

Or some secreted island, Heaven knows where!

But in the very world, which is the world

Of all of us,–the place where in the end

We find our happiness, or not at all!

 

Of course it couldn’t last, but there was the sense of overthrowing the established order – that everything was possible.

HTML is truly an agent of revolution and liberation.

By contrast the publishing industry with its tools such as double-column PDF has trapped us in a digital neo-colonialism which I am struggling to understand. Academia, with its much greater wealth and potential power is increasingly cowed by the shackles of metrics and adopts the dysfunctional and asymmetric relationships. It shows no signs of wishing to break out and control its own destiny. The British Empire flourished by “divide and conquer” and rewarding the heads of the controlled states. There is no need to divide academia – it’s already divided.

And this is to the great disadvantage of the author. Authors should be exploring the potentials of the new media instead of being constrained to the vision of the Victorian printing press. Changing the role of the author is a revolutionary act, made more difficult because the author does not realise how shackled they are.

That’s one of the things we tackled in #beyondthepdf, where some of us were developing the next generation of authoring tools. A primary motivation is to remove the dependence on vendor-controlled formats and tools but completely based on Open Source tools. That’s why I’m inviting Peter Sefton and Martin Fenner for a publishing hackfest in March. The fundamental medium is still HTML, but now enhanced to carry semantic payloads – and those under our control.

Can a handful of people change the world? We have to believe so and there are now an increasing number of examples where individuals have done exactly that.

So, Peter, I don’t have a good term for you. I’m using a general term of “Open Scholarship” but that includes much more. I thought of “Open Authoring” but OA clashes with Open Access. (BTW I always felt Access was a noun!). My best so far is “Open Writing” which doesn’t do justice to the non-textual aspects. The theme of our hackweek will be “Scholarly HTML”.

HTH

 

 

 

Licensing Data

Wednesday, February 9th, 2011

[From Alan Ball]

The Digital Curation Centre (DCC) has published the second in its series of
How-to Guides: ‘How to License Research Data’ by Alex Ball of the DCC, in
association with JISC Legal. The guide explains why licensing data is
important, what licensing options are available to researchers, and how to go
about attaching a licence to a dataset.

The DCC’s How-to Guides offer practical introductions for those who need more
than the high-level basic awareness given in DCC briefing papers, but less than
the in-depth coverage given in the Curation Reference Manual. This guide is
aimed at principal investigators, researchers, and those who provide access to
research data through a data centre, repository or archive.

‘How to License Research Data’ is available for online reading or download
from http://www.dcc.ac.uk/resources/how-guides/license-research-data

This is a valuable medium-level overview of the different legal aspects of publishing data. Data, of course, is now found in many disciplines , not just science. At the very fine-grained level, data is extraordinarily complex, but at a high level it can be very easy.

If you are a scientist and haven’t thought about data licences, then consider the value of making your data available to others. That’s anathema to many traditional scientists – and an attitude that will survive for some time. There are moral, ethical, political, social and utilitarian reasons why you should consider making your data Open. [There are cases where you cannot open data – human privacy, breeding grounds of rare species, etc. And often the decision involves other people. But at least consider it.]

I am not a fan of licences. They are complex, legal algorithms do not map onto any formal system of mathematics, and they are subject to wide variation by country, date and general fuzz. That’s why wherever possible you should adopt the Panton Principles (www.pantonprinciples.org/
) and formally dedicate your data to the Public Domain (PDDL or CC0). The complexity of combining even two licences is far greater than analysing a terabyte of multivariate data for patterns. Multiple licences make data recombination very hard.

AND NEVER USE NON-COMMERCIAL LICENCES.

The only people this hinders and hurts are people like you.

Scholarly HTML Hackfest Cambridge UK March

Wednesday, February 9th, 2011

The momentum of “Beyond the PDF” continues and we are planning a hackfest in Cambridge in March to build scholarly publishing tools. I floated the idea yesterday http://groups.google.com/group/beyond-the-pdf/browse_thread/thread/af2e6a4d43c361f8 (one of the many discussion threads you can read). The details are coalescing to the following:

Martin , Peter and others (including me) formed a “Writing” group as part of BtPDF and came up with a (complete) design for scholarly authoring (on the diagram BTPDF is code name for the system)

(Copyright PMR, CC-BY; used without explicit permission of the author).

We believe that we have enough open source tools, volunteers and service providers to create a compelling prototype. So during about 11-20 March there will be intense activity in Cambridge putting it together. The rough timetable is:

  • 9/10/11 Mar PT arrives
  • 12-13 Hackfest in Chemistry dept
  • 14-18? Martin arrives. Free-form hacking with members of PT, MF and members of PMR group. Visitors welcome in reasonable numbers
  • 19 (Sat) informal hack day. Depends on numbers, probably not in chemistry. Maybe in the Panton (which we think will have wifi). Maybe in the Open Knowledge Foundation. We can’t use the Centre as it’s science week and there are zillions of budding young scientists in the Dept.
  • 20 Integration Hackfest in UCC (if it makes sense). Pub at lunch, evening …

Attendees from last hackfest included Ben O’Steen, Rufus and random OKF’ers, Chris Gutteridge and Dave Flanders.

Anyone is welcome but let us know beforehand for safety/security (it’s a chemistry dept). As we get possible attendees we’ll publish a list. This is not a sleep-over hackfest (it’s a chemistry dept) but there are lots of good pubs.

It is a hackfest, not a tutorial. So be prepared to get your fingers dirty. You don’t have to be a coder – we’ll welcome:

  • Coders
  • Integrators (e.g. people who can work plugins, repos, OAI-PMH, RSS, etc.)
  • Content providers (Open of course)
  • Documentation
  • Packaging and distribution
  • Evangelism
  • Funders

And more

The last hackfest was 2 days and the achievements were sensational. This one will be even more so. We’ll provide geek food. The only problem is burn-out.

We are calling this “Scholarly HTML”. It will bring power back to the authors. Too much of our scholarly communication is controlled by digital neo-colonialists. HTML was and is an agent of revolution and democracy. This hackfest is in that tradition.

 

Panton, Panton, Panton

Monday, February 7th, 2011

Richard Poynder has just blogged about our Panton discussion which has been released as audio and we hope a transcript RSN. But the main reason for this post is that I have learnt something new and unsettling about our adoption of the name “Panton” for our outputs – we now have Panton Principles, Panton Discussions and Panton Papers. I’ll quote and comment…

Last August I sat down in a pub (The Panton Arms) in Cambridge to discuss Open Data with Peter Murray-Rust, Jordan Hatcher and others. The event was the first of what Murray-Rust has dubbed the Panton Discussions. Murray-Rust is a Reader in Molecular Informatics at Cambridge University. The Panton Arms regularly plays host to members of the University Chemistry Department, so it was an obvious place to meet. This special relationship between Cambridge chemists and The Panton Arms is doubtless a consequence of the pub being just down the road from the Chemistry Department.

And perhaps Richard is unaware that Panton Street (between the Chemistry Dept and the pub) is also the home of the Open Knowledge Foundation. Indeed you are quite likely to bump into Rufus and his collaborators in the street. Just last week we had a long intense discussion outside the OKF with the wind whistling through our clothers

However, it could be that The Panton Arms is an appropriate location for discussing things like copyright, Open Data, Open Access, Creative Commons and the Public Domain for another reason.

The Panton Arms, and Panton Street (in which the pub is located), are associated with the Panton family. And in 1806 “Polite” Tommy Panton succeeded in having Parliament pass the Barnwell Enclosures Act, leading to the enclosure of what was then farmland.

I was completely unaware of this – and as I said I find it unsettling. But what is done is done, and often good has been built on the site of evil. Perhaps we need an exorcism. I assume Rufus is aware of this.

Today many argue that the frequent and increasingly maximalist changes made to copyright laws represent a new enclosure movement. And it is partly in response to that process that we have seen a proliferation of “free” and “open” movements like Open Data and Open Access – with the aim of preventing, or at least mitigating, the new enclosures.

Richard is right. I use the terms “digital gold rush” or “digital land grab”. I find it intensely frustrating that so many people are not aware of the problem and when told do not care. Academia can be very arrogant – individuals survive on the handouts from the robber barons while the general citizenry is unaware of the problem.

We are too protected. A non-university healthcare professional is debarred from reading the literature – perhaps 90-95% of it. Non-academics are not our inferiors that we shouldn’t worry about. Yet I frequently hear “oh we can’t let non-experts read that” and similar. This is just as insulting as the many insults of human ownership, gender, race and so on that we have fought over the centuries.

We are entering the century of the information age. It must not become the century of information slavery.

Chemical Markup Language 2011

Saturday, February 5th, 2011

With the release of Chem4Word (sorry Chemistry Add-in for Word) we’ve reached an important milestone in the development of CML. CML is about 16 years old (Henry will give a better estimate – but I think we can reasonably date it from our trip to WWW1 and henry’s subsequent trip to WWW2). I think it’s reasonably come of age and can now be regarded as the de facto approach to representing semantic chemistry. And part of the purpose of PMR Symposium #pmrsymp was to be able to make that assertion. We didn’t actually have much about CML per se, but the working code was all based on CML and we shall be publishing the justification in a special issue of BMC.

It’s not been easy to make that statement until now. It needs at least:

  • A reasonably stable formulation. That’s been impossible for many years as CML has been naturally fluid as we have tried out new ideas. Now we eat our own dog food. Our CML must validate and the dictionaries must exist and resolve.
  • Running code. It’s relatively easy to write a specification. It’s vastly harder to make sure it’s completely implementable. We adopt the IETF motto of “rough consensus and running code” and very little in CML has been deployed without support in at least one major language. When people ask what JUMBO does, the formal answer is that it’s the reference implementation of CML. That’s not dramatic and it’s desperately boring to write. But it’s almost all in place.
  • A user community. There is sufficient variety in the people and places that are using CML that we can be reasonably confident that it has a good user base. A lot of people implement solutions without our being aware of it – that’s perfectly OK, of course, but they may be struggling with problems that have already been addressed. But last week, for example, a group in the GRID community wrote to us who had implemented it under lxml and found bugs in the Schema validation. That seems to be a known problem in lxml.
  • Robustness and portability. It’s got to be possible to implement CML in different environments. It’s got libraries in Java, C++, C#, FORTRAN, Python and Javascript. These don’t all implement everything in the language but they show that everything is reasonably possible.
  • Flexibility and Generality. This is one of the great strengths of CML. It’s possible to express a very wide range of concepts in CML. Because CML contains general tools for physical sciences we can model properties, parameters, complex objects, constraints, etc. The use of @convention is proving to be very powerful for developing new domain without breaking old ones. There are almost no content models (something that is very constraining in XML).
  • Dictionaries. A very powerful means of expressing physical science (and other) concepts. Indeed CML can represent a lot of high-school physics and materials.
  • Interoperability. CML does not try to do everything – the more that other domains provide the better CML works. So it uses MathML for the maths, SVG for the graphics. Specialist representations within chemistry (e.g. EMSL for basis sets or BIOPAX for bioscience). When NIST (after perhaps 15 years) finally releases UnitsML we’ll use that (assuming it’s easy to implement). For large arrays we use NetCDF or similar tools. For complex relationships we use Xlink or RDF. And so on.
  • Simplicity. CML is simple – or at least no more complex than the chemistry it represents. There are no abstract objects or relationships or attempts to build overly complicated models. The elements in a CML file should be understandable by high-school students.
  • Uniqueness and unification. There is no other current approach that supports most of the domains in chemistry in a semantic manner. Much chemical software is centred on connection tables, but these do not support solid state, physical properties, experimental processes, computational chemistry, etc. to the same extent that CML can. There are lots of specialist non-semantic files, but these are often archaic and only work for specific codes. CML provides a central nearly lossless semantic centre.

CML supports five main subdomains and there is extensive experience and code in all:

  • Core. This supports molecules, atoms, bonds, dictionaries and physical quantities, etc. Many implementations.
  • Reactions. Tested with a wide range of reactions including enzymes (MaCiE), literature extraction, and polymers.
  • Spectra. Fully supported in JSpecview.
  • Crystallography. Able to convert complete CIF files and now with 200,000+ structures in Crystaleye.
  • Computational chemistry. Extensively tested with implementations in several major codes and continuing.

And it’s worth pointing out that CML can be used as a computational language – i.e. it can be self-modifying as in polymer markup language.

I owe a huge debt to lots of people and CML really is a community effort, with strong moderation. We wouldn’t be here without the Blue Obelisk, eScience/GRID, and the bioscience community. We’re open to any new ventures and ideas – incorporation in existing codes, chemical publication, artificial intelligence, etc.

CML is ready for universal use within chemistry.

Joe Townsend is coordinating much of the effort. I will be blogging at regular intervals. We hope to get semantic chemical blogging (e.g. in WordPress) very soon.

Bibliographic Data is Open!

Friday, February 4th, 2011

#jiscopenbib

Bibliographic Data are the lifeblood of scholarship. They tell us how to find scholarly artefacts and to recognise them when we’ve found them. The journal names, the authors, the pages. They are as exciting as streetnames and housenumbers.

Which are exciting. Maps are exciting and bibliography is the map of scholarship. It’s not the complete map – but the skeleton. The framework to which other properties are added.

And the question I have been chasing for some months is whether they are Open… Can I make a list of bibliographic data and publish them Openly?

Most people I ask mumble. But two days ago Eefke Smit of the STM Publishers assoc rang me and we talked a lot about what was Open and what was not. The problem is that many things are not clear and Open to interpretation. And so it comes down to “it all depends on”.

Which sometimes it does. The problem is that software can’t make that sort of judgment. It works on Boolean Logic – you can, or you can’t. We didn’t resolve all the questions, but Eefke got back very rapidly and here’s her reply.

P: Thank you very much for a full reply. This is very helpful.

I am copying this in to the Open Bibliography list. [PMR] For their background I have been exploring with Eefke and the STM Publishers association whether text-mining was allowable and whether bibliographic data is copyrightable. Eefke gives a clear answer to the second so I am posting this on this list. I think it now makes possible a lot of very valuable things with Open Bibliographic Data.

On Fri, Feb 4, 2011 at 2:34 PM, Eefke Smit <eefkesmi@xs4all.nl> wrote:

As promised, I would sort out your question about the openness of bibliographies. You made quite clear in our conversation that you are not particularly fond of ‘it depends’ answers. So I fear you may find the following answer slightly disappointing, because also for bibliographies the answer to the question how open they are, depends on what your regard to be elements of a bibliography.

 
 

We  have addressed this in “Principles of Open Bibliographic Data” http://openbiblio.net/principles/

 
 

To start with the simplest elements that are indeed open and considered ‘facts’ hence copyright free: article title; authors of article; journal title; volume-issue information; and dates of receipt/publication. These are all considered to be facts and cannot be copyrighted.

 
 

We have essentially covered these in

Core data: names and identifiers of author(s) and editor(s), titles, publisher information, publication date and place, identification of parent work (e.g. a journal), page information, URIs.


I think this is entirely in line with you and your STM colleagues and this agreement is an extremely important step forward.

 
 

But nowadays people sometimes include much more into bibliographies, for example images, tables, abstracts, even chemical structures. Bibliographic data can include a number of different kinds of fields and information, including thesauri, classifications like chemistry structures, etc., so there can be some information that is copyrightable or systems that are tied into copyright or trademark protected content. 

 
 

Precisely what that is does indeed “depend”. Our list of secondary bibliographic data overlaps greatly with yours. I have highlighted the components that I would believe would be uncopyrightable.

Secondary data: format of work, non-web identifiers (ISBN, LCCN, OCLC number etc.), an indication of rights associated with a work, information on sponsorship (e.g. funding), information about carrier type, extent and size information, administrative data (last modified etc.), relevant links (to wikipedia, google books, amazon etc.), table of contents, links to digitized parts of a work (tables of content, registers, bibliographies etc.), addresses and other contact details about the author(s), cover images, abstracts, reviews, summaries, subject headings, assigned keywords, classification notation, user-generated tags, exemplar data (number of holdings, call number), …

This does not mean that the others were by default copyrightable, but we know of places where people have asserted rights over some of them.

I think you and I differ about whether tables and graphs are copyrightable in this context. I would concede that images which contained creative work were copyrightable but that images representing factual information (e.g. chemical structures) were not copyrightable. For example it would be foolish to be unable to communicate a chemical structure to someone because you might break copyright. There are millions of such images on suppliers bottles and witholding this information means that people could and would die.

I also asked about whom I should contact within a publisher to get a definitive answer from that organization (as most of the time I get no reply).
 

On your question whom to contact for permissions  as a reader, I would advise you to address the ‘rights and permissions departments’ or ‘licensing departments’ at the relevant publisher houses or else enquire via your local license holder (Cambridge library) who their contacts are. Very often these are regionally assigned, so a general list would be difficult to compose.


This seems to confirm that it can therefore be quite difficult to get the right person within a large publishing house and get an answer.

The STM members can be found on www.stm-assoc.org

 
 

Hope this information is of help to you,


Yes it is very useful.
 

Kindest regards, Eefke Smit.

So this is very useful. We agree on this. Bibliographic Data is FREE. As in Speech. Like OpenStreetmap we can start building the bibliographic map of the world.


 

Prinzipien zu offenen bibliographischen Daten

Thursday, February 3rd, 2011

If you can understand that it means two things:

  • That Adrian Pohl has translated the Principles of Open Bibliographic Data into German (thanks!) http://openbiblio.net/principles/de/
  • That if you only speak German you have no excuse now for not SIGNING them

The Law of the Excluded Mumble; Please SIGN the Principles of Open Bibliographic Data

Thursday, February 3rd, 2011

In classical logic there is a Law of the Excluded Middle

http://en.wikipedia.org/wiki/Law_of_excluded_middle

that states (roughly, but read the article) that something is either TRUE or not TRUE (==FALSE).

This principles does NOT hold in scholarly publishing where there are three states:

  • The material is OKD-Open LIBRE. You can use it without seeking permission
  • The material is definitively Not OKD-Open (Gratis or CLOSED). If you re-use it you are liable to take-down mandates, lawyers letters, having services arbitrarily cut off by suppliers of “their” content, and personal lawsuits (and this has happened).
  • MUMBLE.

MUMBLE?

Mumble is the main non-LIBRE response from most publishers when you ask about whether there are specific permissions to re-use material. In terms of frequency they are:

  • Null response. Yes, most publishers don’t even reply to polite requests for factual information. I once mailed FIVE editors of a scholarly journal asking if I could annotate their material. Not one had the courtesy to reply. How can I ensure that a journal or publisher can at least have the decency to reply to a responsible question? But because I am just a reader I can be ignored (the publisher’s customers, or “end-users” are the purchasing officers – readers don’t count for anything in this market). The problem with the null response is that there are so many ways to justify doing nothing.
  • The filibuster. The publisher apparently offers to give an answer but never does. We are still waiting for a response from a major publisher after four years. It’s always polite – “let’s talk about it when we next meet” or similar. By comparison my enquiry with Elsevier about whether I can text-mine chemistry is a mere eighteen months old. We’ve finally got to the stage where they have referred it to their legal experts. All will be revealed on this blog. A few days ago I asked if the discussion could be public. So far, null response.
  • Classic mumble. This can take so many forms. Typical phrases are “it all depends on…”, “well I am not a lawyer, so…” ["my car has broken, can you fix it … Sorry, I am not a lawyer"].
  • Paper chase. “If you refer to the UK copyright act… you’ll find what you need”. Pointing people to legal documents is a surefire way of bottling the problem. We want answers, not meta-answers.
  • Reductio ad absurdum. This is using logic and terminology to escape the problem. I had a discussion recently. I won’t reveal the source. I wanted to know if data in publications were free to use. “well you can re-use really raw data, but data on publisher’s web sites has had creative treatment and so is potentially copyright”. (qualified by “it all depends what sort of data”). Could I use graphs, tables? (Elsevier has given me a NO on this – all data in tables and graphs belongs to Elsevier. See BtPDF discussions. At least NO is better than mumble. Of course I do not accept this.). “So is a spectrum printed from a machine really raw data?” “It all depends – the software used to print it is creative so possibly not.” Oh, dear.
  • The pious hope. Create a declaration that everyone agrees to. The STM publishers agreed 5 years ago (http://www.stm-assoc.org/public_affairs_brussels_declaration.php ). This states “Raw research data should be made freely available to all researchers. Publishers encourage the public posting of the raw data outputs of research. Sets or sub-sets of data that are submitted with a paper to a journal should wherever possible be made freely accessible to other scholars [their emphasis]“. Problem solved. “it all depends what is meant by data”, “it all depends what is meant by free”. “wherever possible”. Classic mumble. I observe that compliance rates are “variable”.

I announced that DOIs were free of copyright yesterday and got a Friend Feed (we are not meant to reveal authors):

“but what possible value could one derive [from] asserting copyright over their DOI suffix?”

Well, I am afraid the answer is “Lots”. Some publishers copyright their identifiers (the ACS copyrights CAS identifiers for chemical compounds (http://en.wikipedia.org/wiki/CAS_registry_number) ). Many publishers sell their tables of contents to meta-publishers. For money. The meta-publishers then sell this information back to us. It’s rather as if I want to know my neighbour’s house number – “I can’t tell you because I sold it to a Directory”. “Can I tell other people what your number is?” “Sorry, signed a contract that I mustn’t reveal my house number without permission”. Bibliographic data is the house numbering of scholarship. Without it you cannot find and identify scholarly works. And in the OKF we assert that this information is LIBRE. Not “should be”. IS.

So if you wish to protect your little market of selling bibliographic data you can assert that this data is created by a creative act. You, the publisher, have creatively created a DOI. It’s your property. So if someone republishes “your” bibliographic data and it includes “your” DOIs you can send “your” lawyers to remove all the work done – and that protects your market.

Let’s assume you are a publisher and your think I’m being unfair. And of course all generalizations are unfair – many publishers are very very cooperative. And you are one of them.

The answer is simple:

SIGN THE PRINCIPLES OF OPEN BIBLIOGRAPHIC DATA (http://openbiblio.net/principles/ )

That will not only identify you as a publisher who regards bibliographic data as LIBRE…

… It will identify you as a PUBLISHER WHO CARES! And that solves the problem of the Excluded Mumble.

BTW the principles are for signing by anyone. Libraries, funders are also particularly welcome.

If everyone signs the Principles then the bibliographic data problem is solved! The DOI was just the first step.

Panton Discussions online

Wednesday, February 2nd, 2011

#pantondiscussions

The Panton discussions are now online. Many people are to be thanked for this – and it’s taken a lot of effort (as always I blunder into things that I don’t understand – recording, streaming, etc.).

They are available at the Cambridge Streaming Media site:

http://sms.cam.ac.uk/institution/CHEM

and also at DSpace:

http://www.dspace.cam.ac.uk/handle/1810/229688

where they will still be bright and fresh in 100 years.

We’ve already had a significant number of downloads.

I think this is a useful format and I particularly appreciated the reverse (where Richard Grant interviewed my for F1000).

Ideas welcome – I think one over two months is about the right frequency.

DOIs are not copyright! What about Bibliographic Data?

Wednesday, February 2nd, 2011

Every so often we take an important step forward in Openness and today is one example.

Norman Paskin of the DOI foundation has confirmed that the DOI foundation does not regard DOIS as copyright and encourages their re-use:

   

to

List for Working Group on Open Bibliographic Data <open-bibliography@lists.okfn.org>

date

Wed, Feb 2, 2011 at 11:50 AM

subject

Re: [open-bibliography] DOIs and openbiblio

   
   
 
   

Peter,
regarding your specific question on whether or not DOIs as identifiers are considered copyright. Like you, I expected that IDF would not make claims of copyright to DOI identifiers. I’m happy to say that I have just confirmed with Norman Paskin, Director of the International DOI Foundation, that IDF does not regard DOI names (identifiers) as copyright and, indeed, encourages their open and widespread use.

Paul

 

This is tremendous! It’s a precisely and fully solved problem. No-one ever needs to ask the question again (maybe we should formally ask it on http://www.isitopendata.org/ – any volunteers?)

 

I do not need to waste any more time on it. I can do something else with my time. I do not need to live in fear of the lawyer’s letter. We can add DOIs into OpenBibliography!

 

By contrast I spend much of my time in wasted attempts to get clear factual answers from publishers. I’ve been waiting for 4 years from one on data. I’ve been in intense discussion with another about text-mining of data for 18 months. They’ve now relayed it to their legal team. I wait with expectation.

 

Trying to get clear factual answers from publishers is a wearisome journey. It’s easy to feel that

 

“Oh, it’s that Murray-Rust again. Just don’t bother to answer and he’ll go away.”

 

Well he won’t and there are others like him.

 

It’s very easy to get the impression that we are engaged in an ongoing conflict with publishers. That’s not universally true, but it’s common.

 

So if publishers want to help us scientists can you please answer a simple question:

 

“Is the bibliographic data in your publications Open?”

 

We all know what this means as we have the principles of Open Bibliographic Data. They are simple to understand. Here are some clear answers:

  • Yes
  • No [and reasons given]

Here’s an acceptable one:

  • Gulp – hadn’t thought. We’ll get back by the end of the week – promise

And unacceptable ones:

  • It all depends on what jurisdictions you are in and how much you are going to use. [This means you cannot use it – so say NO]
  • We’ll send it to our lawyers [knowing that they won't reply – too busy buying companies]

And quite unacceptable, impolite and arrogant:

  • [no reply]