Why we must support CC-BY (e.g. RCUK policy). It’s good for us and good for the world

There is a not-very-healthy series of attacks on the RCUK’s policy of insisting on funded articles carrying CC-BY licences wherever possible. They emanate mainly from non-scientists and, unfortunately, it seems necessary for me to counter this. I will simplify the criticisms of RCUK policy to:

  • It is forcing scientists to publish elsewhere than their journal of choice.
  • It is a waste of money (“Green is cheaper”).

These are perhaps oversimplifications but the whole situation is extremely messy (the publishers help to create lots of FUD, academics are arrogant, and libraries have not taken a coherent position). Any more complex argument is based on irreconcilable starting points. I concentrate on (2). I don’t believe (1) represents the RCUK’s position. And personally I am not in favour of journals. I think megajournals such as PLoS, or ArXiV (as overlaid by the mathematicians) are completely satisfactory for scientific *peer-review and communication*. The only remaining value of journals today is to add a perceived “value” to a scientist’s work. And most of the wasted expense of publication is because academics cannot be bothered to review other scientists – they rely on journal rankings, decided by an archaic metric and unaccountable commercial companies.

To (2). There is a huge waste of money in #scholpub because of its non-competitiveness and inefficiencies. It costs 7 USD to put a paper in Arxiv and perhaps 250 to review it. Current journals charge 2000-7000 USD per paper (whether “Gold” or “Green”). There is no evidence that mainstream heavy-traffic subscription journals where “Green” is practicised cost less than “Gold”. It’s simply that in one case the university bears the cost while in others the funder bears it. In most (but not all) the cost of subscriptions and of author charges is ultimately borne by the taxpayer (in subscription the students also pay). I don’t know the figures, but the average price of a mainstream subscription journal article is around 5000 USD, considerably higher that than the average APC. This process is not subject to the market = the good is not substitutable.

There are huge inefficiencies in the current subscription model. Here are some:

  • Typesetting (ca 100 USD per paper). The main effect of typesetting is to destroy quality. The papers submitted to ArXiV are of higher typographical quality than mainstream journals (my analysis)
  • Salesforce. Why should readers pay for this. In OA they don’t
  • Flashy mastheads on journals. These server no scientific purpose and destroy parts of the scientific communication.
  • Provision of paywall technology. Subscribers have to pay for this.
  • High-paid lawyers to sue pirates. Readers have to pay for them.

IN OA none of these are necessary. If you really want double-column PDF with publishers’ logos and arcane reference formatting I can supply Open software that will convert the ArXiV material.

Because the funding comes from different places any move from subscription to APCs (article processing charges) will cost money. I agree. But I think the problem will be shortlived.

  1. The funders have coordinated their policy in a way that 15 years of university neglect have failed to do. This coordination will lead to much greater pressure on publishers to provide value-for-money. If the publishers didn’t feel the pressure why are they squealing so loudly? Why SOPA/ Why PIPA? And, I discover today, the hydra-like TPP (http://en.wikipedia.org/wiki/Trans-Pacific_Strategic_Economic_Partnership which is – I gather – even worse than ACTA). The funders have the potential to – at least in part – regulate the publishers.
  2. CC-BY provides far more value than its detractors give it credit for. CC-BY gives real value beyond the ability to re-use. Let’s look at some:
  • It can be mined and indexed. No robot can understand licences from non-CC publishers (cf RSC) because no human can. Therefore no mining.
  • It can be repurposed. I can extract all the maps of biodiversity and collate them. I can annotate them. I can correct errors automatically. I can do this for 10,000 in an hour.
  • It can be used for teaching. In New Zealand the Copyright Clearance system forbids fair use. It’s almost impossible to teach without paying huge amounts of money. The University of Auckland pays 1-2 million in permissions for teaching. For teaching

Let’s assume the RCUK support 10,000 papers per year. What’s the added cost of CC-BY? Multiply the sums and you get 10 million GBP (OK,you say, it’s more because of the current Gold hybrid charges. But when you have a determined customer then prices will fall.) Even this figure is far more than it should be.

But I would argue they have added significant value. Let’s ask whether 10,000 peer-reviewed collated papers would be valuable for teaching. I am sure that many universities would jump at the chance. Because in teaching there is much more substitutability. I will always use CC-BY MDPI Materials Science publications for teaching in preference to RSC or ACS because those are not CC-BY. It’s substitutable. And for teaching MDPI papers are almost certainly as good as RSC. Acta Cryst E papers can be used instead of RSC for crystallography.

I’m guessing, therefore, that world universities pay about 1 billion for permissions to use teaching materials. The RCUK CC-BY could cut this considerably for many countries. And if, say, they issued a free collection of all their output it would be really handy.

And I’ll turn it into semantic form for re-use.

The criticism of CC-BY looks at only one small part of the equation. In the larger picture CC-BY provides far more public good. It’s a pity that it will cost early adopters. But countries can and should and do act unilaterally for the good of humankind.

Aid though the free provision of scientific knowledge is probably one of the most politically acceptable and valuable methods.

Posted in Uncategorized | Leave a comment

#kiwifoo Unique Experience

An incredible 2 days at Warkworth, NZ with the Torkington-inspired and led #kiwifoo. Far too much to blog. KiwiFoo is inspired by #scifoo and has the same values and close correspondence of practice and organization. Many thanks to all the organizers, fundraisers, etc.

Held in Warkworth school – some people camped, some camped in the library. PMR found a wonderful B&B with a walk home under the Southern Stars each evening. Massive to see the Milky way and the Magellanic clouds. I couldn’t believe they were real first time – they look like clouds!

Fewer scientists, more artists/business/teachers than #scifoo. Quite a lot of NZ-focused sessions which I really liked. NZ has the advantage and disadvantage of being self-contained. It must create its own ideas and cross-fertilize them with the world.

About 80 sessions I guess. Most inspired by “social entrepreneurship” today. Presented own session on “Liberation software” – how can we infect the ideas with ideas carried by software? 3-4 other people present. Veterans of hackfests are never phased by turnout (although a hackfest of one is tough). We switch to ideas and how to implement them. Chatham House rule, so no names.

We focussed on biodiversity. [We visited these Gannets at http://en.wikipedia.org/wiki/Muriwai on trip back!]

So we can now mine the scholarly literature for biodiversity using automatic software (Pubcrawler, AMI2). It’s quite easy. We’ll extract the following:

  • Species names (e.g. Sula Bassana)
  • Places (e.g. Muriwai)
  • Dates e.g. 2013-02-10

So a robot mining this blog knows it is about gannets in NZ on a particular date.

THOSE ARE FACTS (not “creative works”). They can be legally re-published

There are 2 million articles a year published in EuropePMC. We can mine some of them for species, places and dates.

That’s a major chunk of biodiversity.

This knowledge is critical to mapping what, when, where of bio. The world must have that information, and the machines can extract and provide it.

Anyone interested in basic biodiversity information is welcome to help. You don’t have to know programming.

We are starting the clock today.

Oh, and mammals such as Okapis can be included as well. @okfn_okapi studying the biodiversity guidelines in NZ

 

 

Posted in Uncategorized | Leave a comment

Crafting a Statement of Open Research Principles and Practice for NZ/AU

We’re currently in Auckland, on the last day of the Open Research meeting https://sites.google.com/site/nzauopenresearch/ . It’s been fantastic – new friends and old friends. Today a small group of weary but energised survivors are drafting a statement. Here are some pictures:

Alison Stringer, who’s been coordinating and holding the operation together.

The group has been highly influenced by the Panton Principles (http://pantonprinciples.org/ ). I’ve shown them Sophie Kershaw’s (Panton Fellow) video – one message from Sophie’s work is that the outcome should be concise – e.g. fit on a T-shirt. But I’m deliberately keeping a back seat – this has to emphasize NZ/AU’s unique take on this as well as support existing efforts.

Currently we are running with about 6 concepts. Once the declaration is finalised I’ll blog it. These things have their own momentum and timescale.

Many of us are going on to Kiwi Foo later this afternoon.

[And guess who has snuck into the first photograph…]

Posted in Uncategorized | 5 Comments

Institutional Readerism: The Royal Society of Chemistry isn’t putting serious effort into “Open Access”

I recently got the following internal mail [In our chemistry Dept].

 

[chemistry librarian] am pleased to report that the first paper to be made freely available
under the RSC gold-for-gold voucher scheme is :

Facile assembly of an efficient CoOx water oxidation electrocatalyst
from Co-containing polyoxotitanate nanocages
Yi-Hsuan Lai ,  Chia-Yu Lin ,  Yaokang Lv ,  Timothy C. King , Alexander
Steiner ,  Nicoleta M. Muresan ,  Lihua Gan ,
Dominic S. Wright and Erwin Reisner


Chem. Commun., 2013, Advance Article
DOI: 10.1039/C2CC34934E

This is flagged as ‘RSC Open science free article’ on the RSC website:
http://pubs.rsc.org/en/content/articlelanding/2013/cc/c2cc34934e

 

This excited me. I have just helped to present a summer school on Materials Science Informatics in Melbourne and thought this would be an excellent example to illustrate some of my semantics tools. The authors have paid for this paper. (The RSC has “paid half the charges”, but this appears to be an accounting/marketing exercise – without exposing their accounts to public view there is no evidence the RSC have put real money in. But that’s not my primary current concern).

The authors have paid for openness. Have the RSC given it to them? The article appears:

The only indication of Openness is “RSC Open Science article”. This is not clickable or robot-readable and by default is meaningless. It gives no permissions to me. If I google this I get: (http://www.rsc.org/Publishing/Journals/OpenScience/)

. Author pays to make their article Open Access (Gold Open Access)

We give journal authors the choice of making their article Open Access. If you choose this publication route, you will pay an ‘article processing fee’ after peer-review and acceptance. The final ‘article of record’ is made available to all, immediately, via our website without any barriers to access.

This gives me – the reader – NO RIGHTS. I can read the article – because I can read it. That’s all I cannot (because of the law of copyright).

  • Copy it permanently onto my machine
  • Use it for teaching
  • Use the images in a book
  • Carry out content-mining and publish the results

Or anything else.

Maybe there is an explicit copyright statement in the paper. Yes:

This is technically correct. But 99.9% of readers and many librarians would read this as “the article is copyright RSC”. I have pointed this ou to the RSC over several years and they have not changed this.

It is unfair and IMO unethical not to give a clear indication of the reader’s rights. Because the RSC knows over several years this is an issue and because they have consistently failed to make things clearer they are failing in their responsibility as a learned society. They have NOT said anything about authors’ copyright. I do not impute motives here but call this problem

“Institutional Readerism”

The term Institutional, coined by Macpherson ” (http://en.wikipedia.org/wiki/Institutional_racism), is:

“the collective failure of an organisation to provide an appropriate and professional service to people”

While, clearly, readers’ rights are nowhere as serious as racism, the phrase itself describes precisely how the publishing industry is failing readers (and later I shall show, authors). It is often compounded by weasel words, non-standard and opaque terminology (“Open Science”, “Author Choice”, “Free Content”…)

It’s worse. If I try to re-use this paper I go to the “Request permissions”:

It would cost me 5000 USD to use this article. The RSC has said this was an error

/pmr/2012/11/08/is-the-royal-society-of-chemistry-really-cheaper-than-acs-rsc-charge-50-usd-per-student-per-page-for-teaching-materials/

The RSC said:

The OA bug we can fix with a change to our platform to replace the Permissions link. This one looks like a default setting for educational use we need to sort out with RightsLink (I might be wrong here, just want to explain what the fixes probably are). Both of these will be looked at by the proper people here.

That was THREE MONTHS ago. The RSC has taken no public action to fix this. If they cared they could have:

  • Put out a public statement
  • Removed the rights link from “Open Science” articles
  • OR fixed the link

The most favourable interpretation I can come up with from all this is:

The RSC don’t put effort into reader’s rights in Open Access.

This is institutional Readerism.

“the collective failure of an organisation to provide an appropriate and professional service to readers and authors”

So I have used a different publisher (MDPI Materials, a CC-BY journal) for my seminar and I will consistently use them in future for illustrative material. Impact factor is less important than licensing for training materials.

 

Posted in Uncategorized | 2 Comments

My Response to the UK parliament and BIS on Open Access; keep the CC-BY policy

I have responded to the UK’s BIS in its call for Open Access. Note that this takes a hell of a lot of energy – middle of the night in New Zealand after 10 days of commitment to materials Science Informatics. It’s incredibly draining of energy. Yet we have to keep going.

It’s appalling that academia shows so little interest in defending its digital rights and future. Much of the action comes from people outside mainstream academia while almost all Vice Chancellors and their senior staff are silent. If Universities *wanted* to take control of publishing they could. It’s their inaction which has got us in this mess.

It’s also draining that we have a continuous barrage of criticism of the RCUK policy from non-scientists. I’m all for multidomain debate but I’m getting tired of being told by non-scientists what scientists think and what’s best for them.

In my submission (below) I argue that the RCUK should be supported. They have been “attacked by the Green lobby because their policy is an unnecessary waste of money”. The problem is that we are in such as mess that no effective action can be taken unless we spend money or introduce a totalitarian economy. All current forms of publication with commercial publishers are broken.

Green is broken because it depends on libraries paying whatever subscriptions the publishers demand. Mandates, boycotts over the last 10 years have been unsuccessful. There is no end goal. There will be publishers like ACS who will self-destruct before they allow unpaid Green. There is no evidence that Green will lower total subscriptions – only a shortage of funding will do that.

Gold is broken because there is no proper market. Publishers can charge what they like. It also cannot become universal as so many sectors cannot afford APCs.

Hybrid has the worst of all worlds. It is certainly a waste of money.

The solution is for academics to change publishing and its values. Tim Gowers and other mathematicians are doing this – they will publish at cost in ArXiV (< 10 USD) and overlay the reviewing and journals. That is technically possible in all subjects except most academics are not prepared to do it and want someone to pay someone else to do the hard work.

Anyway here is my submission. I give thanks that the UK is taking a lead. I am more fearful of the restrictive practices being lobbied in Brussels where we rely on brave warriors such as PhD students (Ross Mounce) and Max Haussler (Post Doc) to argue the case while academics do nothing. [I can’t be there – I am in NZ].

 

 

RE: http://www.parliament.uk/business/committees/committees-a-z/commons-select/business-innovation-and-skills/news/committee-announces-an-inquiry-into-open-access/

From (Prof) Peter Murray-Rust

Department of Chemistry

University of Cambridge, CB2 1EW, UK

I address specifically your request 2:

Rights of use and re-use in relation to open access research publications, including the implications of Creative Commons ‘CC-BY’ licences;

I write as a recently retired but still highly active academic who for many years has been researching in chemical information. I have pioneered re-use of information by machines to discover and disseminate new science (simplistically a “Google for Chemistry”). For example one of my students developed a system which could was able to interpret 70% of 400,000 chemical reactions in published patents in 4 days. This leads to a vast amount of new machine-understandable resources – indeed much of the current chemical literature could be transformed within a few weeks on a single machine.

There is an obvious benefit to mining the formal scientific literature in this way. It is higher quality and technically more feasible. Over 3 years I have asked the major publishers repeatedly for permission to mine published content and have been refused or fobbed off in different ways. I have documented some of these vain efforts in /pmr/2011/11/27/textmining-my-years-negotiating-with-elsevier/ – in essence five years of my research have been stalled and I spen perhaps 30% of my time fighting publishers rather than doing science.

I have argued to the Hargreaves process that content-mining in chemistry is worth “low billions” world wide (it is very difficult to quantify a non-activity). /pmr/2012/03/21/my-response-to-hargreaves-on-copyright-reform-i-request-the-removal-of-contractual-restrictions-and-independent-oversight/ . I am delighted that the IPO has agreed to Hargreaves recommendations.

I am on the Science Advisory Board of Creative Commons. Their licences are a key tool – and CC-BY is precise and precisely what is required for content-mining. Please accept that no other current licence for documents achieves the purpose of asserting cleanly that a document can be legally re-used (CC-NC is completely unsuitable). Moreover CC-BY is machine-readable. My robots can determine uniquely that a document can be mined without sending me to court. This is not possible with non-standard licences.

Note that publishers frequently assert that they are “extremely helpful and agree to almost all content mining requests”. This is not my experience nor the experience of others I speak to. It is supported by Elsevier’s own assertion that they have only granted 4 requests a year over the last 5 years. They assert that “there is little demand”; my experience is that they are so uncooperative that most people don’t bother. Moreover each researcher has to do this for every publisher – scaling to tens of thousands of requests. For this reason we need a clear automatic legal instrument.

Assuming therefore that we agree that CC-BY is essential for automatic content mining the subsidiary question is “is it worth paying for?”. There is a school of thought, almost all coming from scholars who do not practice science, that Gold CC-BY is a waste of taxpayers’ money. I concede that the current situation is deplorable – the result of complacency by universities and academics and irresponsible commercialism by publishers. We have a broken market where the only long-term solution is to transform publishers from masters of scientists into their servants. RCUK has an almost impossible problem and I think they have made a clear statement and should be strongly supported. I expect that their stance will change the balance between funders and publishers and that the costs of added Gold will drop dramatically over the coming years. By contrast Green wins nothing – we cannot mine the content and it sends signals to publishers that they can continue as usual.

Unlike some I do not feel that paying for dissemination is a waste of money. I have twice had RCUK grants specifically for dissemination (by other means) and these have been a very useful exercise for me, the University of Cambridge, and the UK. Assuming that RCUK generates a higher number of CC-BY papers these will become highly indexed by machines and thus much more highly seen and quoted. In a recent World meeting on Materials Science I highlighted open CC-BY papers in my plenary lecture to the exclusion of closed ones.

In conclusion I stress that this is a direct conflict, not a negotiation with the closed publishers. They have a 15 Billion industry and huge amounts of time and money to spend on lobbying. In contrast scientists have to divert themselves from useful activities to this constant fight against corporatism. Please, therefore, value our submissions to yourselves in this light. Note that even as I write the publishers are lobbying the EC for restrictive licences on content-mining and when I have finished this letter I have to contend in that arena as well.

JISC has shown that the benefits of open knowledge (I am on the advisory board of the Open Knowledge Foundation) will be very large. The UK has made a wonderful investment in the Open Data Institute – I am asking for permission to get chemical content to put in it.

 

Peter Murray-Rust

 

 

Posted in Uncategorized | 5 Comments

Chuff wants to meet a kiwi (PMR Update)

I’ve just arrived in Auckland (first time in NZ) to attend Open Research (https://sites.google.com/site/nzauopenresearch/) and tomorrow Kiwi Foo. I’ll have more time tomorrow to blog.

Open Research is a really important meeting (I caught the last 10 minutes!) and is forging principles and practice. They’ve been hacking for 2 days so hard that there is visible fatigue today! But tomorrow we are going to pull it together. I shall take a back seat and help out if needed.

Then to Kiwi Foo in Warkworth (http://baacamp.org/ ).

Kiwi Foo Camp 2013

Kiwi Foo Camp is a private gathering of around 150 people from New Zealand, Australia, and the world. Invitees are doing interesting work in fields such as neuroscience, Internet applications, psychology, open source programming, art, business, physics, politics, and all manner of interesting science and technology. They network, share their works in progress, show off the latest tech toys and hardware hacks, and find new partners for collaboration.

I am privileged – it’s a great melting pot. We come with lots of ideas and a spirit of humility and excitement. We don’t know what will happen (other than we won’t have time to talk to more than a fraction of the people there and that we shall be exhausted by the end.

Chuff is going. S/he wants to meet a kiwi.

P.

And the impetus from Materials Science in Melbourne is still blowing my mind.

Also, on 27 Feb (2013-02-27) I’m giving an invited plenary lecture at Columbia: (hashtag #rds2013)

http://library.columbia.edu/news/libraries/2013/2013-1-31_Research_Data_Sympsosium_Announced.html

The Center for Digital Research and Scholarship, Columbia University Libraries/Information Services, Columbia’s Institute for Data Sciences and Engineering, and Elsevier are pleased to announce the Research Data Symposium, an event to lead discussion on topics related to managing and curating research data and a variety of research outputs. The Symposium will be free, open to everyone, and held at Columbia’s Faculty House on Wednesday, February 27, 2013.

The Symposium will offer speaker panels that address the different stages of the research data life cycle. Representatives from Columbia University faculty, learned societies, research institutions, funders, and publishers will come together to examine the implementation stages, available technologies and associated challenges and barriers for managing, preserving and accessing research data. Attendees will leave armed with valuable information to engage their respective organizational stakeholders and initiate and continue long-term research and data management efforts.

It’s a crowded programme and I’ve got 15 mins to kick it off. There is a lot I want to say. So my current plan is to come up with about 1 key point a minute. By itself that would be overload. So I’m going to blog at least some of them in detail *before* the meeting.

I am going to argue that there must be drastic change – in academia, in libraries, in scholpub. We are all considerably behind what the rest of the world is doing.

I might upset a few people. The blog gives a chance to iron out misunderstandings before the presentation.

Then afterwards I go to Kitware. But they deserve a separate blogpost.

Posted in Uncategorized | Leave a comment

The magic of Superinfoductivity and Okapicity – Zero Impedance licences

I’m going to make an analogy between knowledge and conductivity. The precise comparison is flawed but the message is not. Hold on…

Everyone knows about superconductivity (http://en.wikipedia.org/wiki/Superconductivity) – the incredible phenomenon when at very low temperatures electrical resistance becomes ZERO. Not just “very small”. ZERO. And that produces magic. The magic of levitating trains and MRI scanners (supercon magnets).

Here’s a picture (from WP) of levitation and one of the variation with temperature

DON’T SWITCH OFF – I’m not going to spout physics. Look at the green graph. At high temperatures such as 2 (the values are normalized, not Celsius), the resistance (resistivity – same thing) is high. Electricity is wasted into heat. That’s what happens in our power lines. Huge waste. From 2 to 1 the electricity can flow more freely but there is still friction. Still energy loss.

And then at 1 something magic. A cliff. The resistance falls to ZERO. Magic happens.

And an even more remarkable thing happens to helium. It becomes superfluid http://en.wikipedia.org/wiki/Superfluidity .It crawls out of a pot and spills over!

The explanation is:

“These dramatic excitations result in the formation of solitons that in turn decay into quantized vortices – created far out of equilibrium, in pairs of opposite circulation – revealing directly the process of superfluid breakdown in Bose-Einstein condensates. With a double light-roadblock setup, we can generate controlled collisions between shock waves resulting in completely unexpected, nonlinear excitations. We have observed hybrid structures consisting of vortex rings embedded in dark solitonic shells. The vortex rings act as ‘phantom propellers’ leading to very rich excitation dynamics.”[8]

*I* don’t understand this either. But the point is that at very low energies vortices occur and completely remove obstacles. And I’d been writing about the effect of cycles on knowledge flow. So the following pictures have no direct analogy with superfluidity except to confirm something wonderful.

The major restrictions on knowledge flow licences, contracts, portals, walled gardens, etc. Information flow is sluggish, costly, slow. Manual intervention creates huge costs. In the Chem4Word project we drew the following diagram:

‘d

This shows the impedance of not closing a loop. Data get stalled (in this case by non-semantic technology – and more generally by licence problems). However hard you work info flows at subhuman speed.

Let’s close the loop . In the first case….


 

But NOW LET’s include ZERO-IMPEDANCE licences (CC0). The loops can join:


This ONLY happens with non-barrier licences.

And now the network operates at machine speed. Which, as far as humans are concerned is infinite. There is ZERO impedance. The impedance drops to zero, just like the green line.

That’s the difference between CC-NC (resistive) and CC-BY (zero impedance, infinite conductivity).

I’ve been searching for a word:

Supersophisticity? Not-really

SuperInfoConductivity? (Laurent Romary) – I like it.

Or for those who can’t manage many syllables let’s use OpenKnowledge…

OKAPICITY!!

 

 

 

 

Posted in Uncategorized | Leave a comment

A great day for Freedom in Melbourne

I’ve had a hectic day – 0530 -> 2230++ so only the bare bones.

And WE’VE been as well (PMR talks grotty photos!) – Chuff and AMI2:

In the morning I (PMR) presented the idea that we should create a Semantic Web for Materials. I’d worked many days to create material. As always I create about 3 times what I know I will use. Today went smoothly – all demos worked. I don’t know what I used and what I didn’t (my presentations are non-linear – I choose slides and say whatever seems right at the time). It’s a performance, not a lecture.

The key points are:

  • We can create a Semantic web for materials based, at least partly, on Crystaleye
  • WE agree that the semantic tools are largely created
  • AMI2 presented her NEW PDF Reader for diagrams. She can read a PDF graph in less than a second and convert to CML. It needs robustifying but…

We are ready to start extracting content from ALL physical science publications – closed or not – when Hargreaves kicks in this October. This will transform semantic physical science.

Then to Datafest at the Melbourne Age. There were 6 teams challenging using data sets produced by the Age – weather, political donations, etc. Could you make a story/ (DaveF tried to get me into a one-person team and I gave a surreal pitch to the judges which made no sense to anyone). LOTS of people, lots of excitement. Several old friends.

Finally to Tim Berners-Lee at Melb Uni. House full but we got in as DaveF was partial organizer. (Many thanks to Pia Waugh from Canberra for fixing it all up).

In 1994 I heard Tim at CERN and it changed the rest of my life. Today it started much slower for me – perhaps familiarity. But when Tim go to Aaron Swartz there was reall intensity, emotion, passion, compulsion. I’ve tweeted as best I can. Here’s my tweets (scraped and in reverse order). Hope they give an idea?

 

#tbldownunder will probably be an “Aaron’s Law”. New momentum for openness – e.g. academic journals. e.g. #pdftribute to personal PDFs

#tbldownunder aaron was great guy. We shal never know precisely what his motivation was.

#tbldownunder Aaron charged under law which used to require serious but then changed to any break-in

#tbldownunder Feds tried plea bargaining – A would have record for life. Threats fo decades in jail.

#tbldownunder TBL now in full stride. really compelling. Feds said aaron had stolen millions of dollars. JSTOR was fine. Fed continued

#tbldownunder aaron_sw downloaded articles (that he had right) to. But MIT police found out where and set up trap. MIT police handet to Camb

#tbldownunder aaron_sw wrote open access manifesto. wrote loop to downloaded journals. MIT cut off his mac address but aaron switched mac

#tbldownunder now telling how aaron_sw accessed JSTOR

#tbldownunder aaron_sw asked for own file and put on web as pubdomain

#tbldownunder aaron_sw found time when court records were free and downloaded large chunk. FBI opened file and watched his house.But no case

#tbldownunder “hacker” means crafty person

#tbldownunder aaron_sw realised there were things that should be online that weren’t – courrt records were 10 cents.

#tbldownunder aaron_sw was a great member of community. worked on Rss 1.0; f2f mtg of RDF WG “do you know he is 14?” very mature for age

#tbldownunder aaron swartz. tbl met Aaron on IRC

Posted in Uncategorized | Leave a comment

Announce: We (AMI) can now extract semantic information from scientific PDFs

I’m taking this opportunity to announce that we can now extract semantic physical science from the published scientific literature.

This means that scholarly publications become a giant distributed knowledgebase.

Here’s a very brief sketch…

Start with an OPEN ACCESS PDF: http://www.mdpi.com/1996-1944/5/1/27/pdf

YOU can read this. Go to page 6

Could we compare the spectra for Cl and Br? Photocopy onto transparency and overlay?

AMI can now read DIRECTLY from the PDF. And translate into CML (Chemical Markup Language). She reads one page per second.

And creates CML… ON THE FLY

<?xml version=”1.0″ encoding=”UTF-8″?>

<cml xmlns=”http://www.xml-cml.org/schema”>

<spectrum convention=”JSpecView” type=”VIS”>

<spectrumData>

<xaxis multiplierToData=”1.0″>

<array dataType=”xsd:double” size=”621″>208.86 208.92 … 208.86</array>

</xaxis>

<yaxis multiplierToData=”1.0″ constantToData=”200.0″>

<array dataType=”xsd:double” size=”621″>61.38 61.74 … 61.38</array>

</yaxis>

</spectrumData>

</spectrum>

</cml>

And JSpecview can display this!

We’ll work on MDPI content because it’s OPEN! (We can’t work on RSC, Nature, ACS, Elsevier because we will be sued).

But come October 2013 I’ll be back in UK and Hargreaves says it will be LEGAL to mine these sources for facts.

Like spectra.

Exciting times!

Posted in Uncategorized | 4 Comments

Topics and Links for my talk on Semantic Web for Materials

Materials and the Semantic Web

CSIRO/Iowa, Melbourne, AU, 2013-02-31

Peter Murray-Rust, Unilever Centre for Molecular Sciences Informatics, University of Cambridge

Themes

  • The Semantic Web is here; we should adopt it for materials
  • “Computable Wikipedia for materials”
  • Based on Chemical Markup Language (CML), MathML and Scalable vector Graphics
  • We need tools for: authoring, conversion, display
  • Create pre-competitive knowledgebases for materials
  • We should have hackfests!
  • “liberation software” to create robots to liberate all published public factual scientific content.
  • Demos- the semantic web works (most of the time). I’ll update broken stuff

I am extremely grateful for CSIRO and Nico Adams inviting me to Australia for these months. I genuinely regard what Australia has done in eResearch and Data as outstanding and a model for the rest of the world.

Power corrupts; Powerpoint corrupts absolutely (so do Word and PDF)

blog posts

addendum
Zookeys CC-NC

Posted in Uncategorized | 1 Comment