Open Data – la Conference (2012-09-27). Data is truth.

I’ve had the privilege to be invited to an important European conference on Open Data in Paris As a start I have been asked to answer three questions, which I think will appear on a web page. [You’ll see the same three questions answered from a Brazilian perspective – which I didn’t read till I had given mine.] Because I do everything openly here are the questions and how I answered them.

I’d be grateful for comments.

1)      How Open Data can modify the current way of « doing » science, the actual epistemologic paradigm ?

I will interpret “epistemology” in a pragmatic way. If this doesn’t answer your Q please come back. I will also concentrate on science. My discipline of chemistry is a good central pragmatic one, where there is relatively little philosophical discussion. Modern biology is also highly reductionist and I suspect much medicine, materials science, earth and environmental science is similar. I will exclude modern physics where multiverses and similar concepts are constrained by theory rather than data. Thus although I fully support Dirac (“it is more important to have beauty in one’s equations than to have them fit experiment.”) most science is not at this level.

Most science is based on a concept of shared truth and most of the scientists above believe that this has some tangible reality, represented by data. [I also exclude shared materials such as specimens, samples – which are also out of scope]. In the more physical sciences there is a belief that a given experiment can be replicated to provide the same data – if not then either the observation is flawed or the experiment does not address a single truth. Without data we are limited in how we share truth. In many cases we have to take the word of the scientist, which is a non-objective way of doing science. Apart from fraud, and sloppiness there is also unmeasured variation. And, without the data, we are unconsciously importing the values of the other scientist.

But there is also a growing realisation that we can, in principle, have access to all the world’s scientific data. This creates a new type of scientist – the data-scientist. Tony Hey has popularised Jim Gray’s “the Fourth Paradigm” to describe data-driven science. The principle of using other work is, of course, not new and goes back at least to Kepler. But the scale of possibilities changes this qualitatively – everyone can and should be, in part, a data-scientist. It is no longer acceptable not to know what machines can know.

But the practice of holding on to data for personal or institutional gain is still very strong, aided by laziness in publishing data. Despite the term “data deluge” there is an Open Data drought in many areas. Computational chemistry and materials probably consumes >> 1 billion EUR /year but almost none of the raw data is published although it’s trivial by HEP standards – much would probably fit on a modern laptop.  Therefore until everyone has access to the world’s shared knowledge we are still working in the twentieth century.

The primary check on truth is therefore data. 

2)      How can the Open Knowledge movement (and, in general, Open Data) be a vector of democratization of science ?

We are in the middle of a titanic struggle between Openness and ( Closed + Apathy). If one wins it will survive for the rest of this century. But if we lose the freedom of the Net (e.g. HADOPI) we move to a dystopia similar to Orwell’s 1984. Knowledge is the key component as openness is based on the power of knowledge. The recent EP vote against ACTA was a critical battle for Openness and its success was based on Net democracy / neutrality. The Net is our main defence against the creeping corporatism and apathy of modern “democracies” – we can find like-minded people, move rapidly and develop our resources communally while remaining within the law.

Open Data from government is extremely powerful. It’s been very refreshing to see how governments *wish* to share data. Indeed they are often ahead of science in their technology and protocols.

In science we have several problems. Universities and public research labs are conservative and have failed to react to possibilities of change. Many have allowed their knowledge to be appropriated by commercial interests such as publishers and for the most part they don’t care. They have huge amounts of public funding (I calculate that 100-1000 Billion USD is spent each year on public STM research).  They have little effective drive to share anything as their primary purpose is to compete against each other

So in the open movement we have to create approaches to destabilise this dystopia. I see Open Data and Open Software as liberating – I call my own software “liberation software” as it is designed to break down walled gardens by showing the value of communal data. The good news is that this type of approach is emerging in many places and is undoubtedly a “bottom-up” movement – people are sick of many current practices and believe that science will be better, faster, more productive and more valuable if Open.

3)      If you should summarize Open Data, its philosophy and its consequences, in one word, what would it be ?


(Free-as-in-speech, of course)
I think you can translate this to Liberty – with its cognate “libre” in languages other than English.


This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Open Data – la Conference (2012-09-27). Data is truth.

  1. steggb says:

    Be aware that the whole world can read what you write. So be careful people have been prosecuted for libel or harassment. –Markus Lattner

Leave a Reply

Your email address will not be published. Required fields are marked *