Is It Open [Data] Service launched. I try it out!

The Open Knowledge Foundation has just launched the alpha version of β€œIs It Open?” a service designed to help clarify whether scientific data (normally on publishers’ websites but could be anywhere) is truly Open according to the ideas of the OKF and Science Commons. This service allows anyone to ask a formal question about the status of Data Openness and make the process public. This will avoid much wasted effort in repeated questioning and hopefully also promote the value of Open Data.

I have posted a question to Bryan Vickery at Chemistry Central – a BMC Journal. I’m confident his answer will be yes, but it will serve as a test of the service and give us useful feedback. Assuming the process works we will be looking for volunteers who will mail other publishers. In this way we can crowdsource the process of getting the formal position of journals and publishers (the two do not always map).

Here’s the request, and I’ll publish a response when I get it.

Enquiry: dc8b764b-7cb3-47af-ab91-e14fcbeed61c

Summary: Please can you confirm that Data in Chemistry Central are fully Open

Status: Unresolved

Started: 2009-07-11T01:12:47.299048

ID: dc8b764b-7cb3-47af-ab91-e14fcbeed61c

To: bryan.vickery@chemistrycentral.com

Subject: Please can you confirm that Data in Chemistry Central are fully Open

Date: 2009-07-11 01:12:46.490647

Status: Not Yet Sent

Dear Brian,

I am writing to ask you about the Openness of data in Chemistry Central. I know that the journal is Open Access and I would like you to confirm that the data published in it are fully Open. I am using an exciting new tool (Is It Open? – http://isitopen.ckan.net/ – developed by the Open Knowledge Foundation) to ask you this question. You are the first to be asked πŸ™‚

The Open Knowledge Foundation and Science Commons have developed instruments for ensuring that Data can be marked as Open. By this we mean that Data (as distinct from text) can be re-published, re-used, data- and text-mined and used in similar processes without explicit permission and for whatever purpose [subject only to the need to acknowledge authorship and provenance]. While we are confident that your Open Access statement implies both the motivation and practice of this, we’d be grateful for confirmation.

A major reason for asking this from you is that many publishers do not make it clear whether their data is Open. If you can give us this assurance you will act as an example to which we can point others. We are, in fact, hoping to generate a larger number of similar enquiries to other publishers.

The Open Knowledge Foundation has created web buttons (see http://www.opendefinition.org/buttons) which can be used to indicate that Data is Open. A growing number of sites use these as they are a simple and effective way of indicating immediately to humans and robots that the data are Open. I would be delighted if you could think about including such as button on you site, particularly where data-rich documents might occur.

Our enquiry service is at an early stage and we’d welcome feedback – how could we improve it. Were we clear enough?

Many thanks in anticipation of your collaboration.

Peter Murray-Rust

[1] http://www.opendefinition.org/1.0/
[2] http://www.opendefinition.org/licenses/

— Sent by “Is It Open?” (http://isitopen.ckan.net/about) A service which helps scholars (and others) to request information about the status and licensing of information.

Posted in Uncategorized | 2 Comments

Off to Scifoo and Microsoft

Update…

I have been very busy hacking Chem4Word (Joe Townsend is the Doctor Who) and he has assigned me lots of tasks. It’s all starting to look quite good. We’ll be discussing this with Alex Wade and Lee Dirks in Microsoft next week (I blagged myself an invire to the Faculty Summit – many thanks Tony).

En route I am visiting Scifoo (many thanks Timo) – a great mind-blowing mixture of interesting people and ideas run by Google, Nature and O’Reilly. I’ve been before and it was fantastic. This year I think that the themes that Cameron Neylon and I have been developing (Open Data for Science and pervasive data acpture and β€œnotebooks”) will be very important. Oh, and the Campers will get free access to GoogleWave. I would love to put CML into that. Let’s see how we get on…

Busy. I’ve given two interviews recently – one on software for searching patents and one on the problems of scholarly publishing. I’ll try to piece this together on the plane.

Posted in Uncategorized | Leave a comment

The Open Knowledge Foundation is FIVE

Open Knowledge Foundation Blog

Open Knowledge Foundation Newsletter No. 11

July 2nd, 2009

Open Knowledge Foundation Newsletter No. 11 has just been sent out:

Open Knowledge Foundation Newsletter No. 11

Welcome to the eleventh Open Knowledge Foundation newsletter!

Contents:

The OKF turns five and we need your support!

Open Database License (ODbL) goes 1.0

European Open Data Inventory + Summit

Launch of the Open Data Grid

New developments on Public Domain Works

Other news in brief

Thanks to our volunteers!

Support the Open Knowledge Foundation

Further information

THE OKF TURNS FIVE – AND WE NEED YOUR SUPPORT!

This month the Open Knowledge Foundation is five years old. Over those last five years we’ve done much to promote open access to information β€” from sonnets to stats, genes to geodata β€” not only in the form of specific projects like Open Shakespeare and Public Domain Works but also in the creation of tools such as KnowledgeForge and the Comprehensive Knowledge Archive Network, standards such as the Open Knowledge Definition, and events such as OKCon, designed to benefit the wider open knowledge community. (To find out more about what we’ve been up to in the last year, you can read our latest annual report [1]).

Posted in Uncategorized | Leave a comment

Universities should act while they have the chance

From David Wiley’s blog (β€œIterating towards openness”) – David is founder of OpenContent.org. After a general discussion about free-being-inevitable (reviewing reviews of Chris Anderson’s upcoming book, Free: The Future of a Radical Price he moves to higher education:

Competition! Massive amounts of almost-no-barrier-to-entry competition. Much of it will be poor. I suppose you can take some comfort in that. But some of it will be very, very good. And that should scare existing institutions silly. The education game is about to change, and you (your institution) have three choices:

1. Innovate your way forward. If you allow your business model to become flexible and responsive, you can feel your way forward, influencing the emergent educational context as it simultaneously influences your business model. (A dynamic system!)

2. Wait for others to innovate their way forward. Let them shape the future educational context without your input, and hope that 10 years from now higher education is still a place where your institution is relevant. (If it isn’t, you’ll have only yourself to blame.)

3. Ignore / deny that anything is changing (or will ever change). Higher education is too important, too deeply woven into the fabric of society, too critical for employers, and too big a business to fail. (See you on the other side with GM and AIG.)

[…] but higher education will have to deal with [Chris’s] thesis as surely as I’m typing this post. As Lehi taught, there are two types of things in this world – β€œthings to act and things to be acted upon.” The day is close at hand when each university will have to decide which they are.

I had been planning to blog about universities and their attitude to the digital world, so this gives me the incentive. The points are general…

In 1992 I got very excited about the power of digital learning and embraced many of the startup ideas. These included the Globewide Network Academy which is a voluntary organisation (much the same dynamics as Wikipedia, but nearly 10 years ahead). We used MOOs to create VLEs and Marcus Speh ran the first Virtual course on the Web (β€œObject Oriented design using C++”) – the material fell foul of copyright Mordor even then. It won a best-of-the-web in 1994 at WWW1.

These were heady days. I thought the world was changing before my eyes. And I was invited to a Chair in the University of Nottingham to run a virtual course in Computer-Based Drug Design for the pharma industry. It was a technical success (highly rated by the Teaching Quality Assessement) but it didn’t have a sustainable business model and after a few years it closed down and I moved to Cambridge. But I have been looking for that spark elsewhere in Higher Education and I haven’t seen it.

By contrast, go back to 1970 when Harold Wilson initiated one of the great British achievements of the twentieth century, The Open University. That was stunning. The vision led the technology by a long way – much of the material was posted paper, you could get online access to computer over a teletype (110 baud) for 2 weeks a year, and in some cases people had to climb a mountain to pick up the BBC signals. But again it changed my vision for ever. Anyone could, and did, go to the OU. Even if you couldn’t the programs were often stunning. The maths used graphics which – for 1970 – were miles beyond chalk-and-talk.

And now? Where are the universities changing the face of the world? Where communication is infinitely cheap. Where students are wired up with more power than the whole of the world 30 years ago. Where the Internet is changing democracy – where are the changes in academia? Why, at least, are there few substantial discussions about what education means in a distributed world? It’s too easy to see the reverse where education is simply a branded deliverable contract between a customer (student) and a supplier (university).

Well, the internet changes that business very quickly. So unless there are some radically new ideas, Universities may find that others are eating their lunch.

In a later post I want to address the complex and depressing cycle between research and publication and the role of universities.

Posted in Uncategorized | 3 Comments

Scientific Publishing Will Change. Will you?

There’s been a slew of posts and other news items which convince me that we are at a discontinuity in the way science is communicated and valued. I’m too busy hacking Chem4Word (150 units tests work, 300 to go) to do it justice. But I’ll mention two:

  • A very thorough and compelling analysis By Michael Nielsen of why conventional science publishers will crash http://michaelnielsen.org/blog/?p=629. Not β€œwhether” but β€œwhy”. β€œWhen” is more difficult but I hope we start seeing it in the next two years – the publishing industry in its current form is increasingly seen to be harmful to science and the technology is there that will allow people to vote by action, not words. Take time to read it. If you are a publisher, change what you are doing today or the world will change it tomorrow

  • Beyond Institutional Repositories


    Laurent Romary
    INRIA

    Chris Armbruster
    Research Network 1989; Max Planck Society

    • in http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1425692

      This argues that Institutional repositories have essentially failed, certainly for science and that we must adopt an Open model (most IRs are obsessively closed with copyright Mordor drooling over the portals). We must also adopt national and supernational models. To the average scientist this makes sense – why look for science in a thousand empty IRs when PubMed and arXiv aggregate the disciplines in Open fashion.

    I will argue this when I have more time, but here are some predictions for scientists. Let’s look at this in 5 years.

    • Institutional resources (libraries and repositories) will be irrelevant to scientists. A new business model (possibly through national libraries) will provide an infrastructure.

    • Commercial scientific abstracting services (Chemical Abstracts) will be seen to be obsolete compared with the WebOfOpenData.

    • Scientific information will be managed in an Open manner by Learned Societies which are seen to be acting for the benefit of their community (and not for their income). That’s a hard business model to predict but it will happen. In the UK it’s Wellcome Trust, EBI and the British Library

    • A few publishers will be involved in managing science quality and metrics for a reasonable income. They may not be the current publishers.

    • Science will be authored with communal tools (hence Chem4Word :-)) and researchers will use memex-like infrastructure (e.g. Google Wave or whatever overtakes it).

    • Current Open Access models (β€œGreen and gold”) will have played their part in history because authors will publish their papers Openly.

    There is a great opportunity for universities to reclaim their commons.

    • They will unfortunately fail to grasp it, some some new business model will take a critical role in managing academic scientific information.

    Now back to C#. Glyn Moody and Richard Stallman say I shouldn’t be doing it. One benefit is that it forces a constant refactoring of the abstract design – we have made a lot of progress in CML as a result of the project. I feel an urge to refactor JUMBO yet again…

Posted in Uncategorized | 13 Comments

Puntcon2009

Puntcon was a great success – wonderful weather and probably > 40 people of all ages. Many travelled up from London and there was a good representation of geeks (e.g. people working in – often startup – companies doing software or webby things). Also many people from the culture of openness and digital democracy – #mySociety, #OpenKnowledgeFoundation, #OpenRightsGroup, etc.

I met Cory Doctorow and Alice and next generation (I am terrible with names) as were were punted smoothly up the river by Citizen Pollock. Cory and I are sharing a platform in #ILI2009 in October. He’s a science fiction writer and we talked about the values of self-publishing – it’s becoming an interesting option though whether this scales across the field is not so clear. He will be talking very authoritatively on the new aspects of publishing at #ILI2009 and I’m feeling I have very little positive to say. Maybe by then I will have got some input from the library community – haven’t so far.

graphics1

The return – AdamAmyl (red shirt) – involved in running WriteToThem. Other annotations in comments welcome.

graphics2

Bill Thompson (organizer) with Becky Hogge

graphics3

Ross Anderson (Cambridge) talking with BillT and others. Andrew Walkingshaw (centre, black shirt) who worked on MaterialsGrid in our groups and is now a founder member of Timetrics

graphics4

Half of the group (Rufus Pollock, centre, green shirt).

See also (chaileyf)

http://www.flickr.com/photos/chailey/sets/72157620569467541/

and http://twitpic.com/8qq9e

from joannejacobs

So if you missed it, some of the people will be at OpeTech on Saturday at ULU London, so come anyway.

Posted in Uncategorized | 1 Comment

Effective digital preservation is (almost) impossible; so Disseminate instead

I was just about to go back to refactoring Chem4Word, when I saw this pingback on my blog and just have to comment. It’s really important. More of my comments at the bottom…

Which blogs should be preserved?

Richard M. Davis on 26th June, 2009 at 12:00 pm

You’d think it obvious that my blog should be preserved, though I’m not so sure about yours! According to the poster summarising the fascinating 2007 survey by Carolyn Hank et al: β€œThe majority of bloggers agreed (36%) or strongly agreed (34.9%) that their own blogs should be preserved.” Five per cent don’t want their blogs preserved at all; nearly a quarter aren’t fussed either way.

Here’s one of the data tables (which I had to retype as HTML – Peter Murray Rust is right about PDFs and data):

Table 4. Preservation perceptions – general

Strongly agree or agree

Neither agree or (sic) disagree

Strongly disagree or disagree

Should preserve

Personal blog

70.9%

23.8%

5.3%

Every blog

35.8%

27.9%

36.3%

Every comment

31.4%

31.9%

36.7%

All online content

28.2%

22.3%

49.5%

Should not preserve

Some blogs

44.7%

27.7%

27.7%

Some comments

48.4%

31.3%

20.2%

Some online content

51.3%

24.9%

23.8%

The overall pattern seems a good vindication ofΒ  our own project approach, which will progressively move from capturing blog content (posts), to addressing comments and content, reflecting the scale of the bloggers’ own priorities.

It also seems a useful juncture in our project to throw open the question: which blogs should we preserve?

With over 5 million active blogs noted by Technorati, it seems daft to even start to enumerate them but in our field (libraries, archives, information science), several stand out, and it’s the very nature and importance of these that bolster the case for keeping them. I have in mind in particular Peter Suber’s Open Access News blog, but also blogs such as those of Peter Murray Rust, Brian Kelly, Lorcan Dempsey, Dorothea Salo, Jill Walker Rettberg – all ripe with contemporary accounts and robust views on matters of scholarly communication. But in every case, we have cause to wonder: will that information survive, will that link still work tomorrow?

What blogs (or types of blogs) do you think should be preserved, and why?

PMR: This is really important. Blogs are evolving and being used for many valuable activities (here we highlight scholarship). Some bloggers spend hours or more on a popst. Bill Hooker has an incredible set of statistics about the cost of Open Access and Toll Access publications, page charges, etc. Normally that would get published in a journal no-one reads (I have even published in such – it was a huge effort and it’s got one citation. Not that I care about citations). So I tend to work out my half-baked ideas in public. Some people do their early science in the Open. Some are activists. Some review the current landscape, etc.

But preservation is really really difficult. I don’t know how to tackle it. Since 1993 I have been determined to preserve my digital record.

And I’ve failed.

I’ve created courses, forums, data sets, teaching-learning objects, blogs, preprints, etc.

And I’ve lost most of them.

There are many reasons. First it’s extremely hard to preserve complex digital objects. The problems include:

  • compound documents (and only after 15 years is the web coming round to realising this is important)

  • hyperlinks

  • moving URLs/URIs

  • formats

  • semantic behaviour

  • disorganised humans (me)

  • moving institution (4 times)

  • moving computer (about 10 times)

Henry Rzepa and I have worked hard on this and he is more organized than me. We put early versions of JUMBO on CD-ROMs and got the RSC to distribute them with an issue of the journal. I have saved things on DAT tapes from the SGI. DAT??? SGI??? I don’t have a machine which will read 3.5 floppies at home. I have trashed my much beloved BBC Micro.

Every time I change machine I lose large amounts of data.

At some stage someone will invent a true Memex for my digital activities. Until then:

Preservation is effectively impossible.

So what’s the answer? The only one I can think of at the moment is to disseminate as widely as possible. If people want to read your material they will take copies (if that is technically possible). I would urge University Repositories:

Stop agonizing about preservation and start disseminating.

If it’s worth preserving the the web will have a reasonable chance of containing it somewhere. If it’s not, well history will judge whether our current dross are the jewels of the future. We can’t tell.

DISSEMINATE, DISSEMINATE, DISSEMINATE

MAKE IT OPEN. FORGET COPYRIGHT. JUST PUBLISH.

CREATE LINKED OPEN DATA. LINKED OPEN DATA

CREATE AND RELEASE HERDS OF COWS, NOT PRESERVE HAMBURGERS IN A DEEP-FREEZE

Posted in Uncategorized | 3 Comments

Peter Corbett and the OSCAR3 award

Today was a sad and happy event in that we said goodbye to Peter Corbett. Peter has been the chemistry lead in the SciBorg project and has made major contributions to understanding chemical documents and chemical language. He has developed the OSCAR3 program which many people (citation needed…) regard as the leading tool for chemical entity identification and extraction. In simpler terms OSCAR3 can analyse a document (as long as it’s not some awful bitmap or grunged PDF) and identify the chemical words and phrases. Peter has also written on the linguistic science of this – it’s fairly easy to identify the word β€œpyridine” but this isn’t enough. Peter identifies at least 3 uses of the term: the bulk substance (β€œa bottle of pyridine”), a part of a molecule (β€œpyridine rings are aromatic”) and a molecule itself (β€œthe pyridine molecule has C2v symmetry”). He’s written at length on his latest blog post about this.

OSCAR wasn’t the primary scientific reason for the Sciborg project but Peter found time to develop a major tool. This is now being refactored by the OMII group so that it can be run standalone, as a service, as a component in a pipeline, as a β€œchemistry checker” in a word processor, etc. So it was natural to honour this when Peter leaves us.

So here’s Peter’s OSCAR. It’s labelled:

Peter Corbett

OSCAR3

Unilever Centre, 2005-2009

Peter is taking up a position in Linguamatics, a Cambridge-based company with activities in text-mining and other things. I am always proud when people leave us with positive motivation – and it’s important for the future of the UK that this type of work flourishes because it will generate a lot of wealth in the coming decades. (And the UK could do with some wealth).

Peter loved linguistic challenges especially with ambiguity (β€œtime flies like an arrow” can be parsed as β€œfruit flies like a banana”). Another is the conjunction of (nounal) adjectives (β€œpretty little girls school”). So I described him as

A pretty large Unilever Centre language processor domain expert

which will keep most tree-banks busy for a bit.

Valete.

graphics1

Posted in Uncategorized | Leave a comment

The Guardian highlights the eScience pollution project

Well! My appeal for volunteers for the pollution project came in a most unexpected way – it was picked up by the Guardian. The Guardian are champions of Open data (β€œFree Our Data”) and as you read their report of my blog you’ll see the considerable and valuable amplification about the role of OpenStreetMap. (I didn’t highlight this aspect although I have praised OSM before). I’ll quote almost all (without permission, but as fair use): of http://www.guardian.co.uk/technology/2009/jun/25/cambridge-pollution-monitoring-mapping

A chance encounter in the coffee lounge of the Cambridge chemistry department could lead to real-time maps of pollution in the city, as an offshoot of an EU project that is nearing completion in the city.

A team in Cambridge which has been running the Cambridge Mobile Urban Sensing project (CamMobSens) will begin equipping volunteers on bicycles and on foot with mobile phones and pollution sensors linked by Bluetooth.

The sensors will monitor the levels of carbon monoxide and NOx (nitrogen oxides) in the city air and relay them to satellites, which will pass them directly to openly accessible databases being run by the project.

The government does provide an overview of air quality at its own site, airquality.co.uk, but the data is not real-time, and is not mapped in detail, although it is possible to get a Google Earth download which will show the air quality as measured by roadside monitors.

But Professor Peter Murray-Rust, who happened to meet Mark Calleja, the head of the CamMobSens project during a coffee break, has now suggested that the results could be mapped in real time onto the free open-source maps provided by the OpenStreetMap project, a British-inspired project which uses volunteers using GPS locators to create maps of cities and, in time, countries.

Murray-Rust is now appealing to the OpenStreetMap team to get in touch with Calleja, so that by the time the project begins later this summer it will be possible to add the pollution information immediately to maps from OpenStreetMap.

The advantage of using OpenStreetMap rather than online maps from the UK’s official mapping agency, the Ordnance Survey, is that there are no copyright implications in the addition of data to the maps – and no limits or charges on viewing of the maps. Ordnance Survey has recently eased its restrictions on non-commercial organisations using custom online maps through a specialised web interface, but issues remain over its licensing of unrestricted access to those maps.

If it succeeds, it won’t be the first time that Cambridge’s university coffee has had a dramatic effect. In 1991 a group at the Cambridge Computer laboratory aimed a webcam at a coffee pot downstairs from their laboratory so that they wouldn’t have to walk downstairs to find out if it was empty or full. The webcam was later put onto the world wide web – and engineers at Microsoft showed it to Bill Gates in 1994 to persuade him that the web would be important by making it feasible for people anywhere to stay in touch – even with their coffee pots.

PMR: thanks Charles and the Guardian. A responsible and readable account. [I should make it clear that Mark Calleja is not the project leader.]

I asked Mark for pictures of the sensors and he replied:

Here are two pictures of a sensor (note coin for scale [ca. 1.8 cm diameter]), and one of the sort of phone we use (Nokia N80, and O2 give us free SIM cards to use).

graphics1

graphics2

graphics3I gather that the Guardian article has got people interested in collaborating – please don’t approach me but the project as posted yesterday.

Posted in Uncategorized | Leave a comment

Geek Puntcon Cambridge June 26 2009

Every year Bill – no further metadata required – runs a puntcon for geeks. Geeks covers a wide range of affiliations and ideals. There will certainly be a good representation from those who want to Open up the way we do things.

I shall, of course, claim this against expenses as it’s clearly part of my work. We’ll probably pass some ducks.

Puntcon

Cambridge, Sunday June 28, 2009

After the undoubted success of our earlier ventures, we’re going to head off up river again on Sunday June 28.

The Invite

You are invited to PuntCon V, or the seventh great geek punt picnic, to take place on or about the River Cam on the afternoon of Sunday July 13th 2008. We will be heading upriver rather than along the backs – more picnic places, fewer tourists.

As before, turn up outside the Mill public house on Mill Lane between 1200 and 1230. We will head offΒ  between 1230 and 1300 – if you are late you can walk up river and catch us as we don’t punt very fast!

Bring something to drink and something to eat.

I will provide bread, plates, cutlery, glasses and more food/drink

We will take as many punts as we need [one for every six people] and head up river to a convenient picnic place [eg Grantchester Meadows] where we will eat/drink/carouse.

We normally get back around 1800. Those heading back to the station can be dropped at a bridge within walking distance.

Post punting we have the option of retiring to the pub and letting Sunday evening happen around us.

How Much

There is no registration fee or indeed any other cost. Bring food and drink and entertainment. Punt cost will be split between all comers – works out around a tenner per person. Infants are not expected to contribute.

Why PuntCon?

Conferences are fun, but don’t have ducklings. Or champagne. So the legendary Geek Punt Picnic has morphed into PuntCon, the Cambridge leg of the alternative conference circuit.

In keeping with tradition there will be no talks, no presentations, no agenda and nothing to disturb the quiet delights of the river on a Sunday afternoon. But apart from that, it’s a conference and therefore probably tax-deductible.

What happened in previous years?

This Flickr photoset should give you all the information needed.

The RSVP

Let me know if you’re up for it, but come along even if you didn’t. Bring friends as the event is scalable – let me know approx numbers if you can be bothered. Email me at bill@andfinally.com for more.Β  If you use Facebook then there is also an event page.

The Invitees

Please feel free to invite other people – it’s a big river and there are lots of punts. It would be nice (but is by no means essential) if I had a rough idea of numbers in advance so I know how much bread to get, but there’s a Sainsburys five minutes walk away anyway…

How to Get There

The event takes place at Scudamore’s Boatyard, at the corner of Mill Lane and Granta Place, Cambridge.

Details on the Scudamore’s website.

Posted in Uncategorized | 1 Comment