Another PDF hamburger; why must scientific publishing destroy science?

#jiscxyz #okfn #quixotechem

I’m off to #JISCMRD (Managing Research Data) to hear about the new round of projects including our own JISCXYZ. Ours concentrates on the publication of data and we are working with publishers to save and validate data at early stages in the publication process.

Meanwhile here’s an indication of how to destroy data (supplemental data):

 

That’s the commonest method. And here’s another (http://www.rsc.org/suppdata/OB/b2/b209981k/geometry.pdf ). This file could have released useful data to the world. In fact it destroyed it by putting it into PDF. The file should have looked like:

1\1\GINC-PIRX\FOpt\UB3LYP\Aug-CC-pVDZ\C4H8Cl1(2)\BERND\19-Feb-2002\\

#P B3LYP/AUG-CC-PVDZ OPT=TIGHT GEOM=CHECK GUESS=READINT=ULTRAFINE\\Be

D001 with INT=ULTRAFINE\,2\C,0.1063168353,0.3005635652,-0.5502851935

\C,0.1053918322,-1.157928312,-0.5404967856\C,1.3891660007,0.9682412707

,-1.0097749893\H,-0.7786669558,0.7139466375,-1.0458655312\H,1.55655799

58,0.73320935,-2.0718423113\H,1.327841017,2.0566771132,-0.8980161267\H

,2.2479630589,0.603209853,-0.4322274841\Cl,-0.2178927161,0.8953362005,

1.2758334388\C,-1.1453929968,-1.9640369835,-0.4846598299\H,1.059275967

3,-1.6602096716,-0.361576248\H,-1.0103688266,-2.9449592768,-0.96235794

85\H,-1.4450387789,-2.1570821331,0.5624777398\H,-1.9862773313,-1.44654

45235,-0.9684597597\\Version=x86-Linux-G98RevA.7\HF=-617.4354366\S2=0.

755661\S2-1=0.\S2A=0.750023\RMSD=7.998e-09\RMSF=8.394e-07\Dipole=0.129

3808,-0.5162435,-0.9249031\PG=C01 [X(C4H8Cl1)]\\@

Notice the precise formatting. This is REQUIRED to read the file in. Instead the author or the publisher (neither of whom apparently care) tipped it into PDF which introduced spurious line ends. It’s UNREADABLE by a machine. Follow the link and Read the file and see what I mean .

It’s beautiful and garbage. A sickly hamburger.

That’s because almost all publishers don’t care about data. Which means that many of their publications are second-rate. Many are suspect scientifically because the data aren’t published.

Posted in Uncategorized | 7 Comments

Quixote: Computational chemistry; We made it! Please join us.

#quixotechem #okfn

For the last month we have been developing an Open, distributed, automated Knowledgebase for computational chemistry. Millions of valuable data files are created each year – almost none are published. Yet they are amongst the best, the most reproducible science in any discipline. The give believable results about the real world.

They aren’t published because there’s a lack of vision in the computational chemistry community. Unlike crystallography they haven’t espoused community databases. They haven’t seen the value of communal dictionaries to explain the concepts. There’s a variety of ad hoc interconvert systems rather than a unified approach.

We have now produced a prototype of this knowledge base. We can now automate and concatenate the following:

  • Search for unpublished data files
  • Convert them to chemical markup language (CML)
  • Validate the data against dictionaries
  • Convert the results to RDF
  • Upload them to a different Open database

This can be scaled.

From hundreds to hundreds of thousands. You can run the Quixote system on your own file store and discover all your old, unpublished compchem files. Find what’s in them. You can index your disk while you sleep.

This means that the world’s compchem is effectively a distributed Open database. Automated systems can trawl the web and find what’s new in the servers; aggregate and re-distribute. Transform and re-purpose.

So all it needs is your belief. Data matters. Data can now be cited. You can publish and cite your data. Everyone benefits.

Where are we going now. There’s some technical things to do (http://okfnpad.org/quixote20101021 ). Sam Adams will be bolting in his Chem# repository that comes out of the CLARION and #jiscxyz program. The Opensource Avogadro program will be a central tool on the client side for accessing and creating information.

In particular the project met its first milestone of creating a viable prototype within a month. There are bugs (e.g. running from behind a firewall with certificates can be a problem). We need more variety. And more sites (we currently only have 2). We need more people wanting to manage their compchem data better.

The project scales horizontally. We can add in new codes. We’d like to include NWChem – it’s Open source. NWChem volunteers please! We’d like example files from our current codes. We’d like people to help edit the compchem dictionaries.

We4’ve submitted an abstract to the ACS meeting. We can’t make it Open as then we would be debarred from submitting it. But it talks about the project and how it will transform the semantic infrastructure of compchem.

But we are going forward. We plan a full meeting in March. We’re setting up a newsletter. We’ll have chemical search before year-end. We’ll have molecular orbitals extracted and displayed in Avogadro.

Lots of thanks to lots of people:

Jorge, Pablo, Jens, Lance, Mark, Marcus, Tamas, Sam, Weerapong. Six of whom I hadn’t met 5 weeks ago. Who are now actively working on our communal project. And thanks to the Blue Obelisk for creating so much of the components.

Posted in Uncategorized | 3 Comments

How much bibliography can we liberate in a month? Please help. Yes, you!

#jiscopenbib #okfn

This is a rapid post to launch our next virtual collaborative Open Liberation project. (I’m full up with other things over the next two days). Simply:

  • On November 11th Cameron Neylon and I will be presenting at RLUK (Research Libraries) on the Democratisation of Knowledge. He’s going to be showing how communities form and act on the Net and I’m going to following that with a real-collaborative experience.
  • Scholarship is currently shackled by the lack of Open bibliography. We can’t even make a collection of book titles or scientific articles without paralytic fear of breaking some copyright somewhere. So we’re going to solve that problem.
  • We need YOU.
  • You don’t have to be a cataloguer to create bibliography (lists of books and articles). [You don’t have to be an astronomer to catalogue galaxies and that’s at least as hard as books. You don’t have to be a geographer or surveyor to create a Map (and that’s at least as hard as reading the title pages of scientific articles).]
  • YOU can help us.

The goal is fairly simple. We now have 3 million records from the British Library catalogue. We are grateful to them for not only making them available but also making them CC0. Only CC0 or PDDL is really useful for Open Bibliography. CC-NC creates enormous downstream problems. Avoid it like the plague.

Ben O’steen is creating RDF from these records. All you need to know is that RDF is pretty powerful when in hands like Ben’s. It allows flexible and precise searches. So, in days, ben assures me that we shall be able to search the BL catalogue for bibliographic data. Titles, authors, publishers, ISBNs etc. Just open any modern book, flip about 1 or two pages and you’ll find the metadata.

This tells you if your book is in the 15% of the BL catalogue we’ve got.

Over the next 3 weeks we’ll see how many books we can find. We need your help. You don’t have to be an expert (you just need to know what a book is). You need enthusiasm and a degree of care.

And when you have found the book you’ll be able to flag up inconsistencies and omissions. No catalogue, even the BL, is completely free from errors. No map is free from errors. In this way we’ll help to support one of the great institutions of the world.

YOU can do it. And if there are enough of you then we can liberate bibliography. You can do it from anywhere in the world. You could be among the first to see the treasures of the BL’s bibliography.

This blog will flag up opportunities. But you should also join the Open Knowledge Foundation’s http://openbiblio.net/: Just offer yourselves there.

Posted in Uncategorized | 7 Comments

Lunatick Scrimt

#quixotechem #okfn

The Sprint methodology for software is a popular way of developing flexible software quickly and well . A Scrum is a similar approach, with colourful team roles such as chickens and pigs:

All roles fall into two distinct groups—pigs and chickens—based on the nature of their involvement in the development process. These groups get their names from a joke [6] about a pig and a chicken opening a restaurant:[7]

A pig and a chicken are walking down a road. The chicken looks at the pig and says, “Hey, why don’t we open a restaurant?” The pig looks back at the chicken and says, “Good idea, what do you want to call it?” The chicken thinks about it and says, “Why don’t we call it ‘Ham and Eggs’?” “I don’t think so,” says the pig, “I’d be committed, but you’d only be involved.”

So the “pigs” are committed to building software regularly and frequently, while everyone else is a “chicken”—interested in the project but really indifferent because if it fails they’re not the pigs—that is, they weren’t the ones that committed to doing it. The needs, desires, ideas and influences of the chicken roles are taken into account, but are not in any way allowed to get in the way of the actual Scrum project.

There are also in Scrum

  1. the “ScrumMaster“, who maintains the processes (typically in lieu of a project manager)
  2. the “Product Owner“, who represents the stakeholders and the business
  3. the “Team“, a cross-functional group of about 7 people who do the actual analysis, design, implementation, testing, etc.

Now I have launched 3 projects which have some of these characteristics but are not really sprints or scrums – so like Carroll’s frumious I’ll call them “scrimt”s. The feature of a Scrimt is:

  • You propose an impossible task within an absurdly short time. Hence “lunatick” [1]
  • You persuade/trick/seduce… a number of collaborators into sharing your mad vision. These people are making whatever contribution they do out of the goodness or madness of their hearts.
  • It starts from nothing and creates a working, convincing, sustainable prototype within a MONTH.
  • You pick a fixed date when the task has to be completed. This date is really fixed. To add a real thrill it can be to present a working system at a conference.
  • You do everything in full view of the internet.
  • There is no money. (This is now a universal truth anyway, but just to iterate this is an unfunded project with no monetary reward).
  • You are the only pig and scrum-master.
  • You can use whatever resources on the Net that you and your chickens can create/borrow/blag.

     

My three projects so far are:

1997-12 SAX. This is the most successful short collaborative project ever. In this I was the chicken and persuaded David Megginson to be THE PIG. He writes:

The process of developing SAX itself started on Saturday 13 December 1997, mainly as a result the persistence of Peter Murray-Rust. Peter is the author of the free Java-based XML browser JUMBO, and after going through the headaches of supporting three different XML parsers with their own proprietary APIs, he insisted that parser writers should all support a common Java event-based API, which he code-named YAXPAPI (for Yet Another XML Parser API).

Peter initiated a discussion with Tim Bray (the author of the Lark XML parser and one of the editors of the XML specification) and David Megginson (the author of Microstar’s Ælfred XML parser) about coming up with a single, standard event-based API for XML parsers. The design discussion took place publicly on the XML-DEV mailing list, and many people contributed ideas, comments, and criticisms (see below). At the end, Jon Bosak, the founder of XML, kindly allowed SAX to use his xml.org domain for the Java package name org.xml.sax.

David co-ordinated the discussion and wrote the proposal for the interface, together with its Java implementation. The first draft interface — together with front-end drivers for the four major Java XML parsers — was released on Monday 12 January 1998, one month less a day after the beginning of the discussion. This could be a record for an industry initiative (especially considering that SAX was finished under a declared state of emergency, during the worst ice storms in Canadian history, when much of Eastern Ontario and Quebec were without power).

The first draft of SAX received much attention, and over several months, users identified shortcomings and suggested improvements. Over a long period of discussions and pre-releases, the XML-DEV community developed SAX 1.0, which was released on Monday 11 May 1998, less than five months after SAX was first proposed.

Every week David sent out questions for the XML-DEV community to respond to. And they did – a hundred of them. I am sure the overall design was David’s, and it was very clean and compelling. But that’s the virtue of a single PIG.

SAX is now in every computer on the Planet. One, hectic, frantic month to prove it could work.

2010-08 The Green Chain Reaction. I was THE PIG. I set us the task of analysing about 100,000 experiments in European Patents to see whether the solvents were getting greener over time. I did not have a working system. I had bits. These included a PMR-lashup of the Lensfield system; David Jessop’s patent reading software, Lezan Hawizy’s Chemical Tagger, Sam Adams RESTful server code. I had volunteers – half of whom I didn’t know who did a fantastic job in downloading and testing the software and proving that a distributed system could work. It was a sort of map-reduce – the humans farmed out the maps and then reduced them back to the server. It didn’t help that the chemistry dept switched off the electricity for the two days before and during the demo! (It was planned – but do I read emails?). Dan Hagon did a fantastic job in cloning the server and making the demo work.

Is it sustainable. I hope so. I’ll be presenting it to 200 chemists tomorrow at the Dial-a-Molecule meeting – a Grand Challenge to automatic the design and synthesis of chemical compounds. We know that the literature is critical – we have to use patents because the publishers don’t allow text-mining of “their” articles. So science is held back by narrow-minded commercialism. There will be a phase 2 when we have the new Lensfield and the new OSCAR and the new Chemical Tagger.

2010-09 Quixote. This is really barking mad. It’s a project to do in a month with 0 dollars what 20 million dollars failed at a US National Laboratory over many years. To build a self-sustaining distributed Open knowledgebase for computational chemistry. Again I am THE PIG, but I have active chickens and piglets. A month ago we committed to a working prototype on Thursday (2010-10-21) because Lance Westerhoff happened to be visiting Cambridge. We fixed and published the day. We have about 44 hours left. The plan is to automate the process of:

  • Publishing Open computational chemistry data
  • Crawling the data-sites and downloading
  • Converting to semantic form (Chemical Markup Language – CML)
  • Converting to RDF
  • Uploading the data (of all sorts) to the GreenChainServer (or clones of it).
  • Building a search and indexing system

In one sense we started from scratch. We didn’t know each other a month ago. There was no project in embryo. Everything needed designing.

But there was a huge substratum of Open source and open practices. There are many Blue Obelisk programs and libraries which have been designed to work as components. And here’s the remarkable thing – because we develop Open components we create much more modular software than the commercial companies in chemistry. They have to work to create lockin and monolithic applications while we build the system from at least 5 Blue Obelisk projects. And because we use the natural language of the web – CML/XML – RDF, ANTLR, REST, etc. and because we don’t have to worry about security and commercial confidence and so on we can move much faster.

So it’s almost all in place. I’ve just finished bolting in Weerapong’s (very nice) library of RDF generators and CMLComp dictionaries. There is no equivalent anywhere else – this is several years ahead of the game. It makes computational chemistry (traditionally a minefield of FORTRAN punch files) into a set of interoperable components. It’s not finished but it’ll serve the same role as SAX1 did after its month-long scrimt.

And what next? I hope to take Quixote to Materials and India.

But also we intend to have a Bibliography Scrimt, starting now. Can we OPENLY index all the books in the world? Not, perhaps in a month. But we can prove the concept. So, With Ben O’Steen, Mark McGillivray, Will Waites, Rufus and some others we will start to get the chickens together.

Who wants to be THE PIG?

[1] Lunatick does not mean “mad” in this context. It has a special and precise meaning which has a strong analogy to the way we run the Blue Obelisk. You will be doing very well if you work this out!

 

 

 

Posted in Uncategorized | Leave a comment

More Molecules

The following examples may (or may not) be helpful in guiding readers towards determining the relationship between the molecules in http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2643

 

Posted in Uncategorized | 1 Comment

Open Bibliography, the Democratisation of Knowledge and the Scottish Enlightenment at RLUK

#jiscopenbib #okfn #rluk2010 #edinbib2010

WARNING. READING THIS POST MIGHT COMMIT YOU TO DOING SOMETHING.

I have been invited to run a session at the RLUK’s (Research Libraries UK) conference in 2010 (http://www.rluk.ac.uk/node/597 ). It’s in Edinburgh , Scotland, which is still part of the UK, but also has the distinction of having helped lead the Enlightenment (http://en.wikipedia.org/wiki/Scottish_Enlightenment ). I had the privilege of helping start Scotland’s new University (http://en.wikipedia.org/wiki/University_of_Stirling ) and worked there for 15 years. So I am excited to be back in the land to which I and we owe so much.

My session is not on the programme yet – it’s on the Thursday at 1115-1230 in parallel or series with the advertised plenary session.

I shall be asking a commitment from the delegates – and possibly from those of you who can’t go. I generally now don’t “give lectures”. A “lecture” is an old-fashioned way of communication – uni-directional, inefficient. So I try to develop sessions where all of the “delegates” – including remote ones – are involved. All that is required is a good internet connection and hopefully a Twitterfall or similar.

So we will have a practical session on the Democratisation of Knowledge. Since one of my current driving forces is Open Bibliography I will be pursuing that. The goal is simple:

Let’s use the session and the event to make a measurable contributable to the democratisation of bibliography.

The great thing about an event like this is that it gives us a fixed deadline to aim at. There is no slippage allowed. The event will happen. The session will happen. Therefore we have to plan to make it useful.

We have about 25 days from today to make this work. This is an excellent time. It is also a zero-cash project (but not zero-cost). I have done two other such events and both have “succeeded” and I’m hopeful here. It has features of a Sprint (http://en.wikipedia.org/wiki/Sprint_%28software_development%29 )or Hackathon but is different in that much of the work is done remotely and then brought together in a single hour of frantic integration.

We did this with the Green Chain Reaction at Science Online (http://www.scienceonlinelondon.org/programme.php?tab=abstracts#breakout11 ) where volunteers worked for about 4 weeks to bring together software and data into an integrated knowledge resource. This was slightly dented by the Department cutting off electricity to our servers (planned) during the actual session at the BL, but the volunteers copied significant amounts to other servers. We collected data from over 100,000 patents and came to a sort of conclusion. We have probably done about 70% of the work and are retooling to make it automatic.

We are also in the middle of a project to collect the world’s computational chemistry data (ca 5 million data sets per year) and our pilot – Quixote – http://quixote.wikispot.org/Front_Page meets in Cambridge on Thursday about 28 days after we hatched the idea. We’ve had to write some software but it’s primarily integration. I’m optimistic that we can succeed where 30 years of dithering haven’t.

The Open Bibliography project for RLUK will be simpler. We need a name – I am sure one will emerge. For now let’s use the tag #edinbib2010. As with all these 1-month projects the goals can change. But they must be realistic. I am going to start off with some now, but please suggest your own. The OKF has all the infrastructural support required (Wikis, blogs, TRAC, Etherpad, servers, etc.) so you shouldn’t have to take time to set up infrastructure (it may, of course, need customising). Note that nothing here is limited to Research Libraries and I have got the impression that public libraries are often ahead of the universities in their democratisation. Also, where I use “participant” that can and often should be remote. We intend to have a Twitterfall so that everyone can contribute during the session.

  • Participants to get at least one signature from their institution supporting the OKF Open Bibliography Principles. (If these are new to you, read the last week on this blog and also http://openbiblio.net/ ). All it does is ask you to agree that Open Bibliography is a Good Thing. If you can get a vice-chancellor or a head librarian that’s great. But if not, just sign it yourself. [OKF – TODO we need a signup list]. This can, of course, be done worldwide
  • It would be really great to get a Scottish commitment to Open Bibliography. Scotland has recently prided itself (rightly) on adopting Openness more rapidly than England, and I think this could be done. I can’t tell you how to do it, but if it seems achievable, please have a go.
  • Each participant should write to one primary publisher asking for their commitment to Open Bibliography. Again the OKF has pioneered this type of request mechanism with its IsItOpen(Data) for asking about the Openness of scientific information. We’d hope to develop this during the month. In principle (since Bibliography is de facto Open) every publisher should be glad to sign up as it would mean more exposure for their publications. But we shall suggest you choose the most favourable first
  • [UPDATE – how could I have failed to think of this] A communally produced Open Bibliography on the Democratisation of Knowledge.

If we are successful – and we shall be since it depends on YOU – we shall then have:

  • A large and compelling list of institutions worldwide which are committed to (or have at least employees committed to) Open Bibliography
  • A united stance from Scotland
  • A growing list of publishers who have asserted that their bibliography is Open.
  • A world-class Open Bibliography on the Democratisation of Knowledge

If You – at Edinburgh and elsewhere – commit, then within a month we shall change the face of Bibliography.

 

Posted in Uncategorized | 2 Comments

What is the relationship between these molecules?

We had a welcoming session last week – see Egon Willighagen’s blog http://chem-bla-ics.blogspot.com/2010/10/are-these-organic-molecules-same.html where I casually asked what the relationship is between the following two molecules (I deprecate the use of the hatch in these pictures but that’s irrelevant here)

There is a definite answer, but think carefully

I will leave about a day, so we get a range of answers

Posted in Uncategorized | 12 Comments

Principles for Open Bibliographic Data – please comment

#jiscopenbib #okfn

The Open Knowledge Foundation has been working in conjunction with JISC to define various components of Open Scholarship, and now we have come to an important milestone for Open Bibliography. The OKF has run an Open Bibliography list since 201002 : http://lists.okfn.org/pipermail/open-bibliography/2010-February/000000.html . This has attracted a lot of active contributors from a wide range of countries. It’s also spawned a technical list http://lists.okfn.org/pipermail/openbiblio-dev/2010-June/000001.html . In June the JISC funded us (University of Cambridge and OKF) in the “Open Bibliography” project. As a consequence of these activities we have now come up with a draft set of principles on Open Bibliography.

Adrian Pohl took our Panton Principles for Open Data in science and recrafted them to support bibliography. He’s also driven the process through (open) skype telcons and email discussions. We set ourselves a deadline of yesterday for DRAFT principles and he’s published them on openbiblio.net:

http://openbiblio.net/2010/10/15/principles-for-open-bibliographic-data/

PLEASE READ THESE . It is vitally important that you agree with my analysis or challenge it. Apathy is an irresponsible approach. Note that I am NOT talking about COLLECTIONS of bibliographic records (which may be copyrightable in some jurisdictions) but individual records which are uncopyrightable. Here are my comments.

A BIBLIOGRAPHIC RECORD IS BY ITS NATURE IS IN THE PUBLIC DOMAIN.

This is very important. It is not a matter for debate. It is a fact. It is the law. The problem is that some people don’t realise this and others don’t respect it. So I’ll repeat

A BIBLIOGRAPHIC RECORD IS BY ITS NATURE IS IN THE PUBLIC DOMAIN.

I am not arguing that it would be valuable for it to be in the public domain. Or that it is a moral imperative that it should be. I’m stating that IT IS. And we are asking the whole world to restate that a billion times.

A bibliographic record is used to identify a work (book, article, journal, thesis, newspaper, etc. – I am omitting films, music etc. but the same imperative holds for them). I’ll show you what I mean…

I have a book on my desk. Here are some FACTS about it. They are in the public domain. I am preserving the case of the text, but not reporting the fonts. Information is either copied from the book or my observations:

====================================

 

Title: THE MYTHICAL MAN-MONTH

Author: FREDERICK P. BROOKS, JR

Size: 23 cm x 15.4 cm (measured by Peter Murray-Rust, 20101016:11:16:BST – NB all other observations by this person are abbreviated PMR)

Colour: turquoise and black with white writing (PMR)

Subtitle: ESSAYS ON SOFTWARE ENGINEERING

Edition: ANNIVERSARY EDITION WITH FOUR NEW CHAPTERS

Owner: P.MURRAY-RUST (handwritten on inside front cover)

Cover drawing: “Cover drawing: C. R. Knight, Mural of the La Brea Tar Pits, Courtesy of the George C. Page Museum of La Brea Discoveries, The Natural History Museum of Los Angeles County” [from the front matter]

Library of Congress Cataloguing-in-Publication Data

Brooks, Frederick P., Jr. (Frederick Philips)

The mythical man-month : essays on software engineering /

Frederick P. Brooks, Jr. – Anniversary ed.

p. cm.

Includes bibliographical references and index.

ISBN 0-201-83595-9

  1. Software engineering I. Title.

    QA76.758.B75 1995

    005.1’068-dc20 94-36653

    CIP

Copyright © 1995 by Addison-Wesley Publishing Company, Inc.

Printed in the United States of America.

1 2 3 4 5 6 7 8 9 10—MA—98979695

This is all factual information. I have simply recorded it. There are actually at least two mistakes in my record – see if you can find them. Millions of bibliographic entries contain errors. It is the hope that Open Bibliography will help to identify some, but that’s another story.

Here are some more FACTS recorded by PMR:

There are 19 chapters and an epilogue

The “Notes and References” section runs from p. 293-308

The index runs from p. 309-322

The last page in the book is p. 322

There is a picture on p. 2

There is a figure on p. 5

I have published these observations. They are therefore de facto in the public domain. If anyone disagrees with this analysis please challenge me. If, in fact, I do not have the absolute right to make and publish everything above then indeed we have entered Nineteen-Eighty-Four and Fahrenheit 451.

The main thrust of the Open Bibliographic Principles (1-4) is:

  • To identify what is meant by a bibliographic record
  • To assert that is IS in the public domain
  • To agree how to publish it so that it can be used, re-used and redistributred without permission

I am aware that we may need some redrafting around the licences – are we restating that the record IS in the PD or are we dedicating it to it?

Bibliographic Principle 5 is different in nature. There are some parts of a bibliographic record which are subjective (i.e. not FACTUAL) and which some regard as creative and copyrightable. This principle does not seek to determine the legality but to assert that they are beneficial to all parties (author, publisher, library, and of course reader). They are generally not concerned with identification, but more with discoverability, and include:

  • Abstracts
  • Keywords
  • Subject headings
  • Reviews

In the fifth principle we are asking anyone who might consider they had a right to this material (I will not call them the copyright owners as that gives false clarification) to donate this material into the public domain. To say that

  • “we will make our subject classification available in an OKD-compliant way so that it makes it easier to find the work”
  • “we will make our abstract available in an OKD-compliant way so that it makes it for readers to know whether they wish to retrieve the work”

And to do this in a clear manner so that everyone (including robots) knows that they have done this.

Posted in Uncategorized | 1 Comment

Closed bibliography on the Cambridge train

#jiscopenbib #okfn

I came back from the British Library and Imperial War Museum (I’ll tell you why later) on Thursday on the 1615. One of the privilege of the 1615 is that if you get there after 1605 you have to stand or sit on the floor among the folding bicycles. Because I wanted to hack I sat on the floor. I overheard a conversation between two hackers and have caught most of it. They were talking about a book, which I think was about software but I couldn’t see it.

She: “That looks an interesting book”

He: “Yes, it’s written by one of the great software gurus”

She: “What’s it’s called?”

He: “I can’t tell you?”

She: “Why not?”

He: “It’s copyright”

She: “Yes, I know the book is copyright, but I just want to know the title”

He: “Sorry I can’t tell you. It’s copyright”

She: “I only what to know the title”

He: “It’s copyright so I can’t tell you”

She: “Don’t be silly. The title isn’t copyright.”

He: “Yes it is. The whole book is copyright”.

She: “But not the title, surely. Where does it say that the title is copyright?”

He: “Here… it says… ” (Apparently read from the front matter)

He: “All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior permission of the publisher and author.”

He: “So I can’t tell you”.

She: “But the title isn’t part of the publication”

He: “Yes it is. This is a book, right?”

She: “Yes”

He: “and a book is a publication”

She: “Yes”

He: “and the title is part of the book?”

She: “… well, yes”

He: “So I can’t tell you”.

She: “But I am not asking you to put it in a database. I just want you to tell me”

He: “But telling you is transmitting part of the publication. Speech is a transmission. It’s an otherwise. I couldn’t even tap it out in Morse code”

She: “That’s rubbish”

He: “No. Copyright matters. You wouldn’t steal sweets from a child, would you? Telling you the title is stealing. It’s piracy. I’m afraid I don’t like breaking the law”

She: “Well, the law depends on which country it was published in. Where was it published?”

He: “I can’t tell you.”

She: “… don’t tell me, it’s copyright.”

He: “… that’s right”.

She: “in fact you shouldn’t even have told me that it’s copyright.”

He: “Steaming wombats! I’ve broken the law. ”

She: “Well, can I read the title? Surely that isn’t transmitting part of the publication?”

He: “Not sure. I’d have to ask a librarian.”

Posted in Uncategorized | 11 Comments

Open Data Classes for Free at Nottingham

This blog post writes itself. Except that I have a warm feeling as I was a virtual professor at Nottingham for 5 years…

http://www.guardian.co.uk/technology/datablog/2010/oct/13/free-data-nottingham-classes

Nottingham University offers masterclasses in dealing with open data – for free of course

Less than a year on from the release of data.gov.uk and open data sets, university offers classes around the country to those who want to do something with the data flood

Want to become an armchair auditor? Or, even better, push along the free data movement? Then the free masterclasses (though the only requirement is an inquiring mind and “a reasonable working knowledge of web browsing and Microsoft Excel”) being held by led by Horizon Digital Economy Research and the Centre for Geospatial Science at The University of Nottingham may be the ones for you.

“The idea is that it will teach people about how to best extract and interpret the data to produce meaningful statistics which may be useful to them as individuals or their organisations,” say the organisers.

 

Read the rest…

But a major thing is the message:

The dam has burst

The Guardian (the leading UK woolly liberal newspaper) deserves great credit. So does Gordon Brown (probably the most positive thing he will be remembered for).

Break your dams down. The IPR police is in confusion. If the data deserves to be free – as in speech – then free it and speak about it.

 


 

Posted in Uncategorized | 1 Comment