petermr's blog

Scholarly HTML hackfest

Posted on March 8, 2011 by pm286

#scholarlyhtml

We are gearing up for the weekend scholarly hackfest in Cambridge. Like all hackfests it is organised chaos. But we are assembling a range of top-class creators. They include:

Peter Sefton (USQ, ICE, HTML)
Martin Fenner (Hannover, WordPress)
Brian McMahon (Int. Union of Crystallography – publishing, validation, dictionaries…)
Mark MacGillivray (Edinburgh, Open bibliography)
Dan Hagon (ace hacker)
JISC (Simon Hodson will be here on Friday)
PM-R group (Sam Adams, Joe Townsend, Nick England, David Jessop, Lezan Hawizy, Brian Brooks, PM-R, Daniel Lowe) Lensfield, JUMBO, OSCAR, Chemical Tagger, etc…)

So far the following themes are emerging:

Data publication. How do we take a semantic data object and publish it? Currently we are looking at chemistry (crystallography and compchem) and general scientific numeric data
Bibliography. How can we regain control of bibliography. WE authors need the tools to create what WE want to say – not to have to waste time creating something that sucks (“Harvard style” rather than BibTEX) and whose sole purpose is to save the publishers money.
A general flexible authoring platform under our control.

Here’s Martin http://blogs.plos.org/mfenner/2011/03/07/the-trouble-with-bibliographies/. Some excerpts…

Unfortunately allmost all bibliographies are in the wrong format. What you want is at least a direct link to the cited work using the DOI (if available), and a lot of journals do that. You don’t want to have a link to PubMed using the PubMed ID as the only option (as in PubMed Central), as this requires a few more mouseclicks to get to the fulltext article. And you don’t want to go to an extra page, then use a link to search the PubMed database, and then use a few more mouseclicks to get to the fulltext article (something that could happen to you with a PLoS journal).

A bibliography should really be made available in a downloadable format such as BibTeX. Unfortunately journal publishers – including Open Access publishers – in most cases don’t see that they can provide a lot of value here without too much extra work. One of the few publishers offering this service is BioMed Central – feel free to mention other journals that do the same in the comments.

And

My idea for the hackfest is a tool that extracts all links (references and weblinks) out of a HTML document (or URL) and creates a bibliography. The generated bibliography should be both in HTML (using the Citation Style Language ) and BibTex formats, and should ideally also support the Citation Typing Ontology (CiTO) and COinS – a standard to embed bibliographic metadata in HTML. I will use PHP as a programming language and will try to build both a generic tool and something that can work as a WordPress plugin. Obviously I will not start from scratch, but will reuse several alrady existing libraries. Any feedback or help for this project is much appreciated.

If I had a tool with which I could create my own bibliographies (and in the formats I want), I would no longer care so much about journals not offering this service. One big problem would still persist, and that is that most subscription journals wouldn’t allow the redistrubition of the bibliographies to their papers. A single citation can’t have a copyright, but a compilation of citations can. I’m sure we will also discuss this topic at the workshop, as Peter Murray-Rust is one of the biggest proponents of Open Bibliographic Data.

We are able to support this through an EPSRC “follow-up” grant – Pathways to Impact – whose purpose is to disseminate what we have already achieved. This hackfest builds on OSCAR and several JISC projects (who are also supporting some of the group at Cambridge).

Posted in Uncategorized | 17 Comments

Compchem Quixote Workshop: to create the “first Open distributed repository for electronic simulations”

Posted on March 7, 2011 by pm286

#quixote #xmlcml

I am delighted to announce the first Quixote Conference http://quixote.wikispot.org/First_Quixote_Conference_-_22nd-23rd_March_2010 at Daresbury Laboratory. This is the outcome of all the work put in by the Quixote community and is

A meeting to create the first Open distributed repository for electronic simulations

To explain a bit further. There are zillions – probably at least 10 million – computational chemistry calculations “published” each year (i.e referred to in scholarly publications) but almost no data is publicly available. Comp chem is 50+ years old, it’s very well understood, and almost no data is published. [There are some collections – including our own DSpace @ cam – or log files and derived data but it’s << 1% of what is published].

So Quixote intends to change this. We’ve been building the components, and now we intend to bolt them together. Essentially we have the following components:

Lensfield/Quixote – a tool to crawl your disks for compchem
JUMBOConverters – tools to transform the legacy files into XML-CML
CMLDictionary – a formal semantic method of describing the data
Chempound – a repository for indexed numeric and chemical data
Avogadro – a flexible GUI for navigating and transforming the system.
CompChemPub [vapourware] a tool to collect the results into a scholarly publication. To be created during the coming hackfest

The strategy is on a per-code basis. So let’s say your code is called Foochem. Its input is something like:

Molecular/crystal/surface atoms and coordinates
Basis sets and/or pseudopotentials
Parameterisation (level of theory, accuracy, etc.)
Physical constraints (pressure, field, etc.)
Strategy – what to calculate (energies, frequencies, wavefunctions…) and how to do it (algorithms)

And its output should retain all this and also include:

History of calculation (e.g. optimisation)
Final calculated coordinates and electronic properties
Other properties

To create this information needs (at least):

A Foochem dictionary
A Foochem output parser
(possibly) a Foochem input parser
Some Foochem examples

So we are inviting experts in various codes. So far we have NWChem, QuantumEspresso, GAMESS-UK, GAMESS-US, DALTON, Turbomole, Gaussian. We hope to create dictionaries for them, parsers and documentation. This does not need to be complete – the parsers and XML-CML can be expanded when people have time and energy or a really boring cricket match.

It’s a hands-on meeting. You need to be reasonably proficient at running the software (i.e. you may need a few days’ in advance). If anyone is interested, let Jens Thomas know. I think there are some places but it’s up to Jens and colleagues at STFC.

Lots of thanks to lots of people.

Posted in Uncategorized | Leave a comment

Scholarly HTML: hackfest and visit of Peter Sefton and Martin Fenner

Posted on March 4, 2011 by pm286

#scholarlyhtml @ptsefton

We’re gearing up for our scholarly hackfest (March 12-13) – for details see http://www-pmr.ch.cam.ac.uk/wiki/Scholarly_HTML which will be updated and which includes a registration process. This is because it’s over a weekend and we need to know who is in the department (for safety, etc.) This all worked fine in our first hackfest.

As it’s a hackfest the details are fluid but the known facts are:

Peter Sefton is here from midweek next week (9^th March) to about 20^th March
Martin Fenner is here over March 12/13 weekend

The general plan is to CREATE something during the time that PT is here. PT runs a world class team in University of Southern Queensland which has created a proven Open toolset based on WordPress for high quality scholarly documents (e.g. course materials, papers, theses). Martin has likewise pioneered many plugins for WordPress.

We shall invite Peter and Martin to give presentations (but this will need to be on a weekday)

The theme is Scholarly HTML with particular emphasis on data publication. It is to give authors the freedom to author as they wish, not as they are constrained but the recipient. A consequence is that all data should be semantic (i.e. understandable by machine). This means that bitmaps such as PNG should be replaced or augmented by – say – SVG or HTML5. Much of the impetus for the meeting came from “Beyond the PDF” run by Phil Bourne and Anita de Waard.

In general we would like to be able to publish:

Semantic (mainly rectangular) tables where columns have defined semantics
Semantic graphs where axes are semantic and points, lines, bars etc are first-class objects
Maths (MathML)
Semantic bibliography (technically solved, but we’d like to include online OPEN resources (e.g. from Open Bibliography)
Scalable diagrams (probably SVG)
Chemistry/crystallography as CML

There will be many ideas but as a focus we have come up with a unifying project. After discussion with Simon Hodson (JISC) and Brian McMahon (IUCr) we plan to implement the following idea in our JISCXYZ project and to start this during the hackfest. (Simon and Brian hope to be present for some of the time).

A data-journal for crystallography

Every week Crystaleye aggregates (automatically) a few hundred structures and creates fully semantic CML. These are currently published as HTML pages with embedded CML and PNGs (http://wwmm.ch.cam.ac.uk/crystaleye) . A typical page (there are ca 250,000) is http://wwmm.ch.cam.ac.uk/crystaleye/summary/acta/c/2008/01-00/data/av3113/av3113sup1_I/av3113sup1_I.cif.summary.html (you can twiddle the molecule and create the unit cell by clicking). We wish to create a “data publication” from this material.

The proposed data journal will automatically select ca 10 interesting structures per week and publish these as a Scholarly HTML blog. The hackfest will educate us to the best ways of representing these as Scholarly HTML and allowing the best modes of presentation. Because we shall be using a blog readers can comment on these structures using the blog mechanism and also add their own ideas about interesting structures that we have not included. In this way we hope to build up a sense of publication and comment.

There is also the possibility for readers to submit their own structures which will be automatically validated during the submission process. We’ll work very closely with the IUCr during this. We can add to the interest by having ranking tables for authors or contributors and having various “records” such as largest structure.

Assuming that the data journal works technically we will work with BrianM and colleagues to see if the format has value for IUCr.

So – if you are interested, register. We can’t pay travel or accommodation but will provide geek food during the weekend. If you can only come during the week let us know. I will be at JISC on 16^th…

Posted in Uncategorized | 4 Comments

Open Writing and Scholarly HTML

Posted on February 14, 2011 by pm286

I have been struggling to put my thoughts in order about an underprivileged being – the scholarly author. This post is slightly ahead of a well-formed idea but it’s prompted by Peter Suber’s call for a new term for (?collaborative) authoring (http://www.earlham.edu/~peters/fos/newsletter/02-02-11.htm#contest ). Peter is striving for a term describing author-side openness and writes:

I want this new term for several reasons. For example, some OA resources which originally lacked author-side openness add it later, and I (and they) need a succinct way to describe what they’ve done. Sometimes I speak about OA to audiences that know wikis better than they know scholarly journals, and I’d like to say that OA articles have reader-side openness but generally don’t have author-side openness. Likewise, I want a short and non-pejorative way to say that Wikipedia is not the poster-child of the OA movement.

I have been increasingly concerned about the increasing restrictions and constraints on author freedom. In the Internet age author should feel relieved of its shackles and instead scholarly authors are burdened with unnecessary constraints. Part, but only part, are of their own making – chasing the chimera of publishing prestige rather than following a more natural course of saying what they want to say. (“You can publish there – it’s not got an impact factor yet”).

A typical example today. My co-author – acting as amanuensis – grizzled that she had to reset the references in Harvard format. This is about as rewarding as whitewashing coal. It has only one purpose – to save the publisher work at the expense of the author doing the work. There are, of course, much better ways of doing references/citations – and in Open Bibliography we are hoping to develop some of them. But scholarly publishing is – bizarrely – one of the least innovative activities in the information age.

So we need to give back freedom to authors. I think I’m taking a slightly different tack from Peter but the motivations are broadly the same. When NCSA made HTML – and more importantly HTTP – accessible in 1993 it changed the world. The message was that ordinary people could publish. I could set up a server. A little arcane, but until then I had assumed that servers were only possible for those who bought expensive tools from vendors. NCSA httpd was an instrument of liberation. Anyone – anyone – who had access to port 80 (and that wasn’t difficult in those days) could set up a server.

And tell the world anything.

HTML could be authored in any simple text editor. No special tools. And you didn’t even have to get it right. Broken HTML was rendered as best as possible. For those who didn’t experience 1993/4, Wordsworth captured the spirit in

FRENCH REVOLUTION

AS IT APPEARED TO ENTHUSIASTS AT ITS COMMENCEMENT.

Bliss was it in that dawn to be alive,

But to be young was very heaven!

…

Not in Utopia, subterranean fields,

Or some secreted island, Heaven knows where!

But in the very world, which is the world

Of all of us,–the place where in the end

We find our happiness, or not at all!

Of course it couldn’t last, but there was the sense of overthrowing the established order – that everything was possible.

HTML is truly an agent of revolution and liberation.

By contrast the publishing industry with its tools such as double-column PDF has trapped us in a digital neo-colonialism which I am struggling to understand. Academia, with its much greater wealth and potential power is increasingly cowed by the shackles of metrics and adopts the dysfunctional and asymmetric relationships. It shows no signs of wishing to break out and control its own destiny. The British Empire flourished by “divide and conquer” and rewarding the heads of the controlled states. There is no need to divide academia – it’s already divided.

And this is to the great disadvantage of the author. Authors should be exploring the potentials of the new media instead of being constrained to the vision of the Victorian printing press. Changing the role of the author is a revolutionary act, made more difficult because the author does not realise how shackled they are.

That’s one of the things we tackled in #beyondthepdf, where some of us were developing the next generation of authoring tools. A primary motivation is to remove the dependence on vendor-controlled formats and tools but completely based on Open Source tools. That’s why I’m inviting Peter Sefton and Martin Fenner for a publishing hackfest in March. The fundamental medium is still HTML, but now enhanced to carry semantic payloads – and those under our control.

Can a handful of people change the world? We have to believe so and there are now an increasing number of examples where individuals have done exactly that.

So, Peter, I don’t have a good term for you. I’m using a general term of “Open Scholarship” but that includes much more. I thought of “Open Authoring” but OA clashes with Open Access. (BTW I always felt Access was a noun!). My best so far is “Open Writing” which doesn’t do justice to the non-textual aspects. The theme of our hackweek will be “Scholarly HTML”.

HTH

Posted in Uncategorized | 5 Comments

Licensing Data

Posted on February 9, 2011 by pm286

[From Alan Ball]

The Digital Curation Centre (DCC) has published the second in its series of
How-to Guides: ‘How to License Research Data’ by Alex Ball of the DCC, in
association with JISC Legal. The guide explains why licensing data is
important, what licensing options are available to researchers, and how to go
about attaching a licence to a dataset.

The DCC’s How-to Guides offer practical introductions for those who need more
than the high-level basic awareness given in DCC briefing papers, but less than
the in-depth coverage given in the Curation Reference Manual. This guide is
aimed at principal investigators, researchers, and those who provide access to
research data through a data centre, repository or archive.

‘How to License Research Data’ is available for online reading or download
from http://www.dcc.ac.uk/resources/how-guides/license-research-data

This is a valuable medium-level overview of the different legal aspects of publishing data. Data, of course, is now found in many disciplines , not just science. At the very fine-grained level, data is extraordinarily complex, but at a high level it can be very easy.

If you are a scientist and haven’t thought about data licences, then consider the value of making your data available to others. That’s anathema to many traditional scientists – and an attitude that will survive for some time. There are moral, ethical, political, social and utilitarian reasons why you should consider making your data Open. [There are cases where you cannot open data – human privacy, breeding grounds of rare species, etc. And often the decision involves other people. But at least consider it.]

I am not a fan of licences. They are complex, legal algorithms do not map onto any formal system of mathematics, and they are subject to wide variation by country, date and general fuzz. That’s why wherever possible you should adopt the Panton Principles (www.pantonprinciples.org/
) and formally dedicate your data to the Public Domain (PDDL or CC0). The complexity of combining even two licences is far greater than analysing a terabyte of multivariate data for patterns. Multiple licences make data recombination very hard.

AND NEVER USE NON-COMMERCIAL LICENCES.

The only people this hinders and hurts are people like you.

Posted in Uncategorized | Leave a comment

Scholarly HTML Hackfest Cambridge UK March

Posted on February 9, 2011 by pm286

The momentum of “Beyond the PDF” continues and we are planning a hackfest in Cambridge in March to build scholarly publishing tools. I floated the idea yesterday http://groups.google.com/group/beyond-the-pdf/browse_thread/thread/af2e6a4d43c361f8 (one of the many discussion threads you can read). The details are coalescing to the following:

Peter Sefton (http://ptsefton.com/) will visit us for about 11-20 March 2011. PT runs a tremendous group in the University of Southern Queensland creating academic authoring tools
He’ll be joined by Martin Fenner probably midweek. Martin (http://blogs.nature.com/mfenner/, http://blogs.plos.org/mfenner/2011/01/23/beyond-the-pdf-%E2%80%A6-is-epub/) is a scientist and is changing the way we think about scientific authoring.

Martin , Peter and others (including me) formed a “Writing” group as part of BtPDF and came up with a (complete) design for scholarly authoring (on the diagram BTPDF is code name for the system)

(Copyright PMR, CC-BY; used without explicit permission of the author).

We believe that we have enough open source tools, volunteers and service providers to create a compelling prototype. So during about 11-20 March there will be intense activity in Cambridge putting it together. The rough timetable is:

9/10/11 Mar PT arrives
12-13 Hackfest in Chemistry dept
14-18? Martin arrives. Free-form hacking with members of PT, MF and members of PMR group. Visitors welcome in reasonable numbers
19 (Sat) informal hack day. Depends on numbers, probably not in chemistry. Maybe in the Panton (which we think will have wifi). Maybe in the Open Knowledge Foundation. We can’t use the Centre as it’s science week and there are zillions of budding young scientists in the Dept.
20 Integration Hackfest in UCC (if it makes sense). Pub at lunch, evening …

Attendees from last hackfest included Ben O’Steen, Rufus and random OKF’ers, Chris Gutteridge and Dave Flanders.

Anyone is welcome but let us know beforehand for safety/security (it’s a chemistry dept). As we get possible attendees we’ll publish a list. This is not a sleep-over hackfest (it’s a chemistry dept) but there are lots of good pubs.

It is a hackfest, not a tutorial. So be prepared to get your fingers dirty. You don’t have to be a coder – we’ll welcome:

Coders
Integrators (e.g. people who can work plugins, repos, OAI-PMH, RSS, etc.)
Content providers (Open of course)
Documentation
Packaging and distribution
Evangelism
Funders

And more

The last hackfest was 2 days and the achievements were sensational. This one will be even more so. We’ll provide geek food. The only problem is burn-out.

We are calling this “Scholarly HTML”. It will bring power back to the authors. Too much of our scholarly communication is controlled by digital neo-colonialists. HTML was and is an agent of revolution and democracy. This hackfest is in that tradition.

Posted in Uncategorized | 3 Comments

Panton, Panton, Panton

Posted on February 7, 2011 by pm286

Richard Poynder has just blogged about our Panton discussion which has been released as audio and we hope a transcript RSN. But the main reason for this post is that I have learnt something new and unsettling about our adoption of the name “Panton” for our outputs – we now have Panton Principles, Panton Discussions and Panton Papers. I’ll quote and comment…

Last August I sat down in a pub (The Panton Arms) in Cambridge to discuss Open Data with Peter Murray-Rust, Jordan Hatcher and others. The event was the first of what Murray-Rust has dubbed the Panton Discussions. Murray-Rust is a Reader in Molecular Informatics at Cambridge University. The Panton Arms regularly plays host to members of the University Chemistry Department, so it was an obvious place to meet. This special relationship between Cambridge chemists and The Panton Arms is doubtless a consequence of the pub being just down the road from the Chemistry Department.

And perhaps Richard is unaware that Panton Street (between the Chemistry Dept and the pub) is also the home of the Open Knowledge Foundation. Indeed you are quite likely to bump into Rufus and his collaborators in the street. Just last week we had a long intense discussion outside the OKF with the wind whistling through our clothers

However, it could be that The Panton Arms is an appropriate location for discussing things like copyright, Open Data, Open Access, Creative Commons and the Public Domain for another reason.

The Panton Arms, and Panton Street (in which the pub is located), are associated with the Panton family. And in 1806 “Polite” Tommy Panton succeeded in having Parliament pass the Barnwell Enclosures Act, leading to the enclosure of what was then farmland.

I was completely unaware of this – and as I said I find it unsettling. But what is done is done, and often good has been built on the site of evil. Perhaps we need an exorcism. I assume Rufus is aware of this.

Today many argue that the frequent and increasingly maximalist changes made to copyright laws represent a new enclosure movement. And it is partly in response to that process that we have seen a proliferation of “free” and “open” movements like Open Data and Open Access – with the aim of preventing, or at least mitigating, the new enclosures.

Richard is right. I use the terms “digital gold rush” or “digital land grab”. I find it intensely frustrating that so many people are not aware of the problem and when told do not care. Academia can be very arrogant – individuals survive on the handouts from the robber barons while the general citizenry is unaware of the problem.

We are too protected. A non-university healthcare professional is debarred from reading the literature – perhaps 90-95% of it. Non-academics are not our inferiors that we shouldn’t worry about. Yet I frequently hear “oh we can’t let non-experts read that” and similar. This is just as insulting as the many insults of human ownership, gender, race and so on that we have fought over the centuries.

We are entering the century of the information age. It must not become the century of information slavery.

Posted in Uncategorized | 1 Comment

Chemical Markup Language 2011

Posted on February 5, 2011 by pm286

With the release of Chem4Word (sorry Chemistry Add-in for Word) we’ve reached an important milestone in the development of CML. CML is about 16 years old (Henry will give a better estimate – but I think we can reasonably date it from our trip to WWW1 and henry’s subsequent trip to WWW2). I think it’s reasonably come of age and can now be regarded as the de facto approach to representing semantic chemistry. And part of the purpose of PMR Symposium #pmrsymp was to be able to make that assertion. We didn’t actually have much about CML per se, but the working code was all based on CML and we shall be publishing the justification in a special issue of BMC.

It’s not been easy to make that statement until now. It needs at least:

A reasonably stable formulation. That’s been impossible for many years as CML has been naturally fluid as we have tried out new ideas. Now we eat our own dog food. Our CML must validate and the dictionaries must exist and resolve.
Running code. It’s relatively easy to write a specification. It’s vastly harder to make sure it’s completely implementable. We adopt the IETF motto of “rough consensus and running code” and very little in CML has been deployed without support in at least one major language. When people ask what JUMBO does, the formal answer is that it’s the reference implementation of CML. That’s not dramatic and it’s desperately boring to write. But it’s almost all in place.
A user community. There is sufficient variety in the people and places that are using CML that we can be reasonably confident that it has a good user base. A lot of people implement solutions without our being aware of it – that’s perfectly OK, of course, but they may be struggling with problems that have already been addressed. But last week, for example, a group in the GRID community wrote to us who had implemented it under lxml and found bugs in the Schema validation. That seems to be a known problem in lxml.
Robustness and portability. It’s got to be possible to implement CML in different environments. It’s got libraries in Java, C++, C#, FORTRAN, Python and Javascript. These don’t all implement everything in the language but they show that everything is reasonably possible.
Flexibility and Generality. This is one of the great strengths of CML. It’s possible to express a very wide range of concepts in CML. Because CML contains general tools for physical sciences we can model properties, parameters, complex objects, constraints, etc. The use of @convention is proving to be very powerful for developing new domain without breaking old ones. There are almost no content models (something that is very constraining in XML).
Dictionaries. A very powerful means of expressing physical science (and other) concepts. Indeed CML can represent a lot of high-school physics and materials.
Interoperability. CML does not try to do everything – the more that other domains provide the better CML works. So it uses MathML for the maths, SVG for the graphics. Specialist representations within chemistry (e.g. EMSL for basis sets or BIOPAX for bioscience). When NIST (after perhaps 15 years) finally releases UnitsML we’ll use that (assuming it’s easy to implement). For large arrays we use NetCDF or similar tools. For complex relationships we use Xlink or RDF. And so on.
Simplicity. CML is simple – or at least no more complex than the chemistry it represents. There are no abstract objects or relationships or attempts to build overly complicated models. The elements in a CML file should be understandable by high-school students.
Uniqueness and unification. There is no other current approach that supports most of the domains in chemistry in a semantic manner. Much chemical software is centred on connection tables, but these do not support solid state, physical properties, experimental processes, computational chemistry, etc. to the same extent that CML can. There are lots of specialist non-semantic files, but these are often archaic and only work for specific codes. CML provides a central nearly lossless semantic centre.

CML supports five main subdomains and there is extensive experience and code in all:

Core. This supports molecules, atoms, bonds, dictionaries and physical quantities, etc. Many implementations.
Reactions. Tested with a wide range of reactions including enzymes (MaCiE), literature extraction, and polymers.
Spectra. Fully supported in JSpecview.
Crystallography. Able to convert complete CIF files and now with 200,000+ structures in Crystaleye.
Computational chemistry. Extensively tested with implementations in several major codes and continuing.

And it’s worth pointing out that CML can be used as a computational language – i.e. it can be self-modifying as in polymer markup language.

I owe a huge debt to lots of people and CML really is a community effort, with strong moderation. We wouldn’t be here without the Blue Obelisk, eScience/GRID, and the bioscience community. We’re open to any new ventures and ideas – incorporation in existing codes, chemical publication, artificial intelligence, etc.

CML is ready for universal use within chemistry.

Joe Townsend is coordinating much of the effort. I will be blogging at regular intervals. We hope to get semantic chemical blogging (e.g. in WordPress) very soon.

Posted in Uncategorized | 1 Comment

Bibliographic Data is Open!

Posted on February 4, 2011 by pm286

#jiscopenbib

Bibliographic Data are the lifeblood of scholarship. They tell us how to find scholarly artefacts and to recognise them when we’ve found them. The journal names, the authors, the pages. They are as exciting as streetnames and housenumbers.

Which are exciting. Maps are exciting and bibliography is the map of scholarship. It’s not the complete map – but the skeleton. The framework to which other properties are added.

And the question I have been chasing for some months is whether they are Open… Can I make a list of bibliographic data and publish them Openly?

Most people I ask mumble. But two days ago Eefke Smit of the STM Publishers assoc rang me and we talked a lot about what was Open and what was not. The problem is that many things are not clear and Open to interpretation. And so it comes down to “it all depends on”.

Which sometimes it does. The problem is that software can’t make that sort of judgment. It works on Boolean Logic – you can, or you can’t. We didn’t resolve all the questions, but Eefke got back very rapidly and here’s her reply.

P: Thank you very much for a full reply. This is very helpful.

I am copying this in to the Open Bibliography list. [PMR] For their background I have been exploring with Eefke and the STM Publishers association whether text-mining was allowable and whether bibliographic data is copyrightable. Eefke gives a clear answer to the second so I am posting this on this list. I think it now makes possible a lot of very valuable things with Open Bibliographic Data.

On Fri, Feb 4, 2011 at 2:34 PM, Eefke Smit <eefkesmi@xs4all.nl> wrote:

As promised, I would sort out your question about the openness of bibliographies. You made quite clear in our conversation that you are not particularly fond of ‘it depends’ answers. So I fear you may find the following answer slightly disappointing, because also for bibliographies the answer to the question how open they are, depends on what your regard to be elements of a bibliography.

We have addressed this in “Principles of Open Bibliographic Data” http://openbiblio.net/principles/

To start with the simplest elements that are indeed open and considered ‘facts’ hence copyright free: article title; authors of article; journal title; volume-issue information; and dates of receipt/publication. These are all considered to be facts and cannot be copyrighted.

We have essentially covered these in

Core data: names and identifiers of author(s) and editor(s), titles, publisher information, publication date and place, identification of parent work (e.g. a journal), page information, URIs.

I think this is entirely in line with you and your STM colleagues and this agreement is an extremely important step forward.

But nowadays people sometimes include much more into bibliographies, for example images, tables, abstracts, even chemical structures. Bibliographic data can include a number of different kinds of fields and information, including thesauri, classifications like chemistry structures, etc., so there can be some information that is copyrightable or systems that are tied into copyright or trademark protected content.

Precisely what that is does indeed “depend”. Our list of secondary bibliographic data overlaps greatly with yours. I have highlighted the components that I would believe would be uncopyrightable.

Secondary data: format of work, non-web identifiers (ISBN, LCCN, OCLC number etc.), an indication of rights associated with a work, information on sponsorship (e.g. funding), information about carrier type, extent and size information, administrative data (last modified etc.), relevant links (to wikipedia, google books, amazon etc.), table of contents, links to digitized parts of a work (tables of content, registers, bibliographies etc.), addresses and other contact details about the author(s), cover images, abstracts, reviews, summaries, subject headings, assigned keywords, classification notation, user-generated tags, exemplar data (number of holdings, call number), …

This does not mean that the others were by default copyrightable, but we know of places where people have asserted rights over some of them.

I think you and I differ about whether tables and graphs are copyrightable in this context. I would concede that images which contained creative work were copyrightable but that images representing factual information (e.g. chemical structures) were not copyrightable. For example it would be foolish to be unable to communicate a chemical structure to someone because you might break copyright. There are millions of such images on suppliers bottles and witholding this information means that people could and would die.

I also asked about whom I should contact within a publisher to get a definitive answer from that organization (as most of the time I get no reply).

On your question whom to contact for permissions as a reader, I would advise you to address the ‘rights and permissions departments’ or ‘licensing departments’ at the relevant publisher houses or else enquire via your local license holder (Cambridge library) who their contacts are. Very often these are regionally assigned, so a general list would be difficult to compose.

This seems to confirm that it can therefore be quite difficult to get the right person within a large publishing house and get an answer.

The STM members can be found on www.stm-assoc.org

Hope this information is of help to you,

Yes it is very useful.

Kindest regards, Eefke Smit.

So this is very useful. We agree on this. Bibliographic Data is FREE. As in Speech. Like OpenStreetmap we can start building the bibliographic map of the world.

Posted in Uncategorized | 1 Comment

Prinzipien zu offenen bibliographischen Daten

Posted on February 3, 2011 by pm286

If you can understand that it means two things:

That Adrian Pohl has translated the Principles of Open Bibliographic Data into German (thanks!) http://openbiblio.net/principles/de/
That if you only speak German you have no excuse now for not SIGNING them

Posted in Uncategorized | Leave a comment

Scholarly HTML hackfest

Compchem Quixote Workshop: to create the “first Open distributed repository for electronic simulations”

Scholarly HTML: hackfest and visit of Peter Sefton and Martin Fenner

Open Writing and Scholarly HTML

FRENCH REVOLUTION

AS IT APPEARED TO ENTHUSIASTS AT ITS COMMENCEMENT.

Licensing Data

Scholarly HTML Hackfest Cambridge UK March

Panton, Panton, Panton

Chemical Markup Language 2011

Bibliographic Data is Open!

Prinzipien zu offenen bibliographischen Daten

Recent Posts

Recent Comments

Archives

Categories

Meta