The Bohannon "Sting"; Can we trust AAAS/Science or is this PRISM reemerging from the grave?

I hadn’t meant to post on the Bohannon/Sciencemag/AAAS  “sting”  (where journals were spoofed into accepting junk papers). Many others have done this (summarised by Graham Steel inter alia ). But then I learnt today there was to be a live video conference with Michael Eisen, David Roos,  and Science (Jon Cohen,  and John Bohannon) [1900 UTC 2013-10-10 – so there’s time to catch it]. I posted my concern – no idea whether I get picked to present it.

My concern is whether Science/AAAS can be regarded as neutral in this issue. Some years ago legacy (non-open-access) publishers hired a consultancy firm to denigrate Open Access (“Open Access is junk science”) – the activity was called PRISM (not to be confused with the current PRISM). This included the AAP and some of us asked publishers if they wished to dissociate themselves from this. I cannot remember immediately what Science’s / AAAS did. I believe there are still legacy publishers who will use lobbying and money to try to discredit OA and I would need assurances from Science/AAAS that they distance themselves from such attempts. Bohannon’s study can be seen as such an attempt.

Here’s the background. Six years ago the American Association of Publishers , of which AAAS is a member secretly hired (for about 500, 000 USD) a professional consultant to discredit Open Access. The proposal got leaked and the blogosphere reacted angrily, just as they have here. The proposal was essentially a “dirty tricks” approach to discredit OA, not in the eyes of academics, but politicians.
This is exactly what the Bohannon sting has done six years later. My concern is that Science/AAAS may have indulged in “dirty tricks” to protect closed access publishing and I am challenging them to show differently.
There was ample documentation. Here’s my blog post of that time which also references Peter Suber’s analysis. Quotes from PS about PRISM

The emails [PeterS] received show that Dezenhall advised the AAP to  focus on what seem to me to be emotive and highly misleading  messages. Publishers were told to equate traditional journals with  peer review, even though open-access publications operate peer review  in exactly the same way. US government plans to boost access to  papers, which include making all publicly funded health research  available via a dedicated archive, were to be described as  “censorship” and “copyright theft”, though it is hard to see what  possible basis these accusations can have….

I and others wrote to a number of publishers asking if they would dissociate themselves from PRISM. Some did; I cannot immediately recall whether AAAS was one (if they did dissociate my concerns may be lessened).
Do not equate non-profit with “neutral” or “fair”. One of the prime movers of PRISM was the American Chemical Society and so there is no a priori reason why we should expect AAAS to be whiter-than-white. There’s a lot of money involved and although I don’t know about AAAS, some ACS officers were paid over 1 million USD at the time of PRISM. [Read the IRS returns if you doubt this].
So if I get the chance tonight I’ll be asking at what level of AAAS this spoof was authorised (it clearly took months). Will I believe the answers? I will wait to see what they are.

Posted in Uncategorized | 1 Comment

Royal Society Of Chemistry's new Repository: My initial thoughts on Open Data

Royal Society of Chemistry announces a new repository for the chemical sciences. I have been asked to comment (by an American Chemical Society organ) and will outline my thoughts below. This is important both for chemistry and more widely for scientific data. First the announcement.

Today http://www.rsc.org/AboutUs/News/PressReleases/2013/RSC-ann
The Royal Society of Chemistry today announces a new subject-based repository that will make it easier for researchers to find and share relevant journal articles and data from a single point of access.
David James, the Royal Society of Chemistry’s Executive Director of Strategic Innovation, said: “The Chemical Sciences Repository will offer free-to-access chemistry publications and integrated data in a single place. 
“This repository extends the services the Royal Society of Chemistry already offers researchers. With this new service we are improving our ability to ensure that the outputs from research activity are made as widely available as possible – to meet the needs of the scientific community, funders and others interested in accessing our content in a more comprehensive, streamlined way.”
The initial release will provide an article repository as a central point through which users can access the Royal Society of Chemistry’s open access articles, whether they are funded immediate open access articles, or articles that must be made open access after an embargo period, such as those funded by RCUK, the Wellcome Trust or NIH. This article repository will be available at the end of October 2013.
The repository will point to the Article of Record as the primary source. It will make open access versions of the article available when any embargo period expires. 
David James continued: “We plan to grow the Chemical Sciences Repository, with the addition of open access papers from institutional repositories, other publishers, and individuals – as well as theses, data and models.  
“The repository will make it easy for researchers to deposit their articles and data, and scientists will also find it easy to find and reuse compatible datasets. 
“As a community service the repository will catalyse further collaboration and open innovation between chemical scientists all over the world.”
The Royal Society of Chemistry will announce additional elements to the data repository in the coming months. Work is already underway with major UK universities around data extraction and upload, Electronic Lab Notebook (ELN) integration, and micropublishing. Offering functionality with chemical scientists specifically in mind, the repository will support the building of validation and prediction models to maximise the value and quality of the data collections. 
Head of Chemistry at the University of Southampton, Professor Philip Gale, said: ” My colleagues and I welcome this initiative: a collection of chemistry data curated by the Royal Society of Chemistry will be of significant value to the worldwide chemistry community. 
“We are now working with the Royal Society of Chemistry to enable best practice, to expose laboratory data in an intelligent and usable manner.”

This ought to be a wonderful thing. Whether it is will depend on what the RSC says and does.
There is a desperate need for a new way to manage scientific data in an Open manner for the benefit of the world. Some already do very well – astronomy, geo-imaging, proteins and genes. Others such as chemistry, materials are virtually closed. Some, like crystallography, are midway.  Anyone who has not browsed the bioscience databases should do so – it’s truly eye-opening and a complete example of Linked Open Data (TimBl’s 5-stars: make data open, comprehensible to humans and machines and link it both ways).
There’s an absolutely pressing need for places for scientists to deposit data. The process must be trivially easy, valuable to the depositor and valuable to the community. And the community must insist that data gets deposited. It’s almost universal for  readers of scientific articles to find that there is no data. Or that the data is only available as a PDF or a photocopied fax. (We were funded for several year by JISC to develop semantic tools and to provide advocacy but, ultimately, we simply couldn’t get chemists interested – this is common in other disciplines such as biodiversity).
The norm should be that a chemist deposits their data as they create it – often on an hourly basis. That’s how we develop software. It doesn’t have to be visible immediately (though  massive credit to Mat Todd and similar who do this). But at the very least the data associate with a thesis, a paper, or a presentation should be available. Actually it leads to greatly improved science and that ought to be an overwhelming argument.
And if we do this we shall create better data. Because the data will have to be semantic. Bioscience has put huge efforts into semantics – chemistry effectively nothing and materials science is worse.  We cannot run an information-rich world without machines consuming and transforming information automatically. And that is where Open is critical. The Semantic Web depends on Open.
And scientific societies and international unions are, IMO, the best places for managing scientific data. They have the resources to manage communities, and like the IUCr (crystallography) to develop ontologies. When it works well, it’s a great example of global collaboration. When it works badly (or negatively) it can destroy science and scientific creativity.
My judgment will depend on whether the RSC’s repository is truly Open. The least open end of the spectrum is shown by the American Chemical Society (Chemical Abstracts – CAS) and Elsevier (Reaxsys, once Beilstein) with their closed, walled gardens of chemical information.  Originally these resources (which have tens of millions of compounds and reactions) were shining examples of innovation.  In the 1970’s the brilliant work in chemistry, often supported by ACS, inspired me to go into chemical  informatics . Unfortunately nothing has changed in 30+ years and the ACS and Elsevier are now holding chemistry back behind the forefront of science and it’s getting worse rather than better. A typical example is that when Wikipedia started to use CAS identifier system – the natural thing to do – CAS threw the lawyers at them with a cease-and-desist. I commented on this (see Peter Suber’s summary)  and wrote (in 2008):
[ACS] have done the following:

  • re-asserted their position that they care for revenue more than supporting the wider chemical community
  • re-advertised themselves as one of the least progressive learned societies
  • alienated a growing number of young scientists who look to the Web as a critical part of the future of chemistry…

It seems inevitable that community based resources grow old and closed (examples being CAS, OCLC, and Cambridge Crystallographic Database – about which more in later blog posts). Without checks the RSC resource will go in this direction and if they feel their role is to compete in the market with ACS it will be quicker. Money distorts judgment. CAS officers can earn over a million dollars a year and they think like Fortune500 companies rather than a resource for the community.
So I have some questions and suggestions for the RSC:

  • The resources must be completely Open. That means conformant to the OKF’s Open Definition (Free to use, re-use and redistribute without restriction other than possibly attribution.).
  • There must be NO CC-NC licences. (We had a collaboration with the RSC on text-mining – I pleaded for the output of our research – an annotated corpus – of RSC papers to be licensable as CC-BY, but the RSC insisted on CC-NC. So the valuable corpus is not available to the community.
  • NO API-only access. The whole contents must be downloadable. This is a challenge, but it’s essential. A database wher I don’t know the scope of the contents is little use – I and my machines must be able to browse it.
  • Community involvement. The contents of the repository are created by the community. It’s appropriate that they are involved in their use and development. If not, the repository will inevitable drift to something developed and maintained by staff rather than depositors and users.
  • The RSC must see themselves as a facilitator of science, not an owner. They must not think in terms of “our repositiry”. They should encourage knowledge (software,data, etc.) from any source and facilitate its use. The bioscientists do. Europe PubMed Central does.
  • The RSC must not use it to give preference to their publications. Of course by default they can only put in their own and CC-BY resources. Must they must not use the repository to create a quasi-monopoly for their publications.

They should burn “Open” into their organizational DNA. This won’t be easy, but it’s the way that scientific societies must go. I’d recommend a steering committee for the repo which included strong representation from outside their immediate community – such as CODATA, IUCr, and students.
If they can do all this – and I believe the requests above are reasonable – then I think it will be wonderful. But the RSC has not had a proactive approach to the modern information world and it will have to be a serious change of direction.
I am developing software which will extract chemistry from a wide range of documents in automatic mode. I would intend to deploy them very shortly. I would like to deploy them on the RSC repository and extract data for my own purposes and the world’s purposes.
 

Posted in Uncategorized | 2 Comments

TDM Update and Summary of LIBER Text-and-Data Mining meeting last week

I’ve already blogged about the LIBER meeting last week, but TDM is now a central part of my raison d’etre and there is going to be a lot on this blog. I’m close to announcing the first alpha release of our software and our BBRSC project with Bath, Matt Wills, and Ross Mounce started this week. I will be working with Ross tomorrow and probably hack some of the docs.
Meanwhile here is Paul Ayris’ report: http://www.libereurope.eu/blog/the-perfect-swell-at-the-british-library. Paul is Head librarian at UCL – he has been on the advisory group of some of our JISC projects – and he’s President of LIBER – the European library association that ran the meeting. Here’s his conclusions:

A panel session at the end of the day enabled the whole audience to discuss the points made during the formal presentations. No-one doubted the importance of the role of TDM for the future of European research and commercial competitiveness. As chair of the panel session, I attempted to summarise the findings of the day around the following points:
 
1. TDM is important for Europe and the European Research Area;
2. Copyright reform is required, in both the EU Copyright and Database Directives, to give a level of surety to researchers to enable them to pursue TDM;
3. A Fair Dealing Exception for the purposes of research was widely regarded as a helpful way forward;
4. Should such an Exception embrace commercial activity too? During the day, startup companies had emphasised the importance of TDM for their competitiveness;
5. Copyright reform will take time. What is needed is something more immediate, and the Horizon 2020 funding programme should be used to test what needs to be done in EU copyright legislative reform through innovative, exemplar projects;
6. A report of the day, along with a link to the video recordings, will be sent to the Commission: DG Connect, DG Research, DG Education and Culture, and DG Internal Market.
 

Yes. There is no doubt we are in a fight. If we don’t fight, we will be overwhelmed by the devices of the publishing industry. They will look reasonable to politicians and publishers have huge amounts of money (from us) to spend on lobbyists. This is the first time I have really got a window on politics and it’s awful what is happening.
So join our effort. There’s a need for advocacy, and a need to build an open toolkit and protocol that is overwhelmingly better than what the publishers provide (or rather don’t provide) at the moment . It takes energy and commitment.
Yours.
PS:
Special thanks should go to Susan Reilly in the LIBER Office for leading on TDM issues, and for organising such an important and high profile Workshop in LIBER’s name.
Yes, Susan, a great meeting and look forward to the report.
 

Posted in Uncategorized | Leave a comment

Text and Data Mining – fighting for our Digital Future ("Peter Murray-Rust is the problem")

Last week there was an important meeting run by LIBER, the association of Research Libraries in Europe. http://www.libereurope.eu/news/the-perfect-swell-a-workshop-on-text-and-data-mining-for-data-driven-innovation. To be quite clear, the meeting was held because (legacy, scholarly) publishers are spending large amounts of time, effort and money to stop people Text and Data Mining unless controlled by the Publishers. We have tried to work towards a resolvable position, but been forced to walk out of talks in Brussels because the industry wants to licence our activities.
We have asserted that “the Right to Read is the Right to Mine“. I’m very happy that I came up with this phrase as it expresses precisely what we believe -that there is no difference between human access to a document and a machine’s access. Another useful phrase (John McNaught’s, not mine) was that “Text and Data Mining saves Lives“.
Licensing destroys Text and Data Mining. It’s designed to do precsiely that. Imagine as few as 1000 researchers negotiating licences with 1000 publishers. That’s 1 million licence negotiations. The legacy publishers have shown themselves to be regressive in their use of licences, and in many case incompetent. Do we really believe that they have the interests of researchers at heart?
So licences lead to the conclusion “NOT(Text and Data Mining saves Lives)” (I daren’t write the obvious phrase as I’ll be attacked for being emotional).
There’ll be a full summary of the meeting shortly, I gather. I’ve put my tweets below, with some comments. I’m going to comment on a few, but one in particular. The publishers create a huge amount of FUD around TDM.  It’s a terrible threat to them. (It isn’t actually and any realistic publisher would welcome it – but the last year has shown that the legacy publishers are now the determined anatagonists of change – they have done nothing other than fight against anything new – and they are spending millions doing it).
@CameronNeylon  addressed the FUD [my tweets]:
Myth1:  “researchers don’t want TDM”. This is rubbish. the public activity is smallish because:

  • All but the most determined get stopped by their publisher or fear of being prosecuted or getting take-downs. I have spent 3 years trying to get any agreement out of Elsevier – the best was ” You must discuss your proposed work with Elsevier, you must use our gateway and your results may belong to Elsevier”. Not surprising that this deters people
  • There are few communal resources (because people are deterred). There are few corpora (because publishers forbid their creation, and few tools because there isn’t much to work on [that will change]

Myth2 “TDM will crush our servers” . this is so hilarious it’s worth getting Cameron to give you a replay of his talk. It amounts to about 1 millionth of their traffic. If they can’t accommodate they shouldn’t be publishing. Textmining on PLoS is TRIVIAL problem. Crawl-delay 30 seconds. TDMers are well behaved and obey this!
Myth3 “this ain’t core business” (i.e. why should publishers spend tie talking with me).  “Publishing is not PLoS’ USP. Nor filtering. it’s Annotation .PloS is about dissemination.
Myth4 : “If we allow TDM Peter Murray-Rust will distribute all our work” . More than one publisher havs said this to Cameron.  Wow! I’m famous! Cameron “There’s something wrong if publishers are frightened of a retired academic with a laptop”.
To be quite clear. I want to carry out TDM in a way that will create minimal technical problems. I am happy with a 30 second delay – this gives me time to process everything before the next request. I’m happy to acknowledge where the material was published. This is responsible science. If I was a pirate wanting to sell content to China I’d have managed it already without the publishers even knowing. I don’t want to break the law.
My right – and it may have to be confirmed in court is to use machines to mine the content I have legitimate access to. Anything less than this is irresponsible. I could actually add value to the publishers if they were reasonable. Here are some examples:

  • detecting errors in content (do publishers want high quality?)
  • indexing their material for scientific search engines (do they want people to find their material?)
  • findings of making their material easier to read and use (do they want that?)

Anyway I’m going to be starting TDM in earnest now that our BBRSC project has started at Bath. It will be massive and I’ll blog this very frequently. If you are a publisher your should be very afraid very excited.
==============================================
Here are some unedited tweets … Chris Yiu and John Boswell gave a chilling example of why the US has 82% of the new digital industries. The US welcome it. Europe wrings it hands and destroys it with uncertainty. Lucie Guibault gave a good overview of the law effectively confirming that in Europe there is no certainty and no support for the brave entrepreneur. Nilu Satharasinghe setting up startup in Cambridge UK, but if hits legal difficulties will immediately relocate to US. (same story). Caroline Dynes (Royal Society) stated that the RS will allow TDM without licence. Wow! Perhaps other publishers will follow.
Chris Yiu Digital Policy Unit, Policy Exchange UK
Sharing data saves time paper money
data opportunites: realtime opportunities; supermarkets know instantly every tin on shelves
personalization opportunities better user experience;
solving problems we couldn’t solve before; data quantities can be handled, ebay uses semantics to stop gaming (w computervision)
can often make good predictions about future -> policymakers.
huge opportunity for new industries in Europe.
healthcare data analytics analysing use of statins in UK. prescribing guidelines + GP prescribing. would save 200 M GBP/year
future of census UK (a) use internet or (b) fuse together smaller surveys.
with right investments Willets UK is well placed for big data revolution
Willets big data is one of 8 areas for Britain to invest in
UK has more digital businesses than they thought.(most active in Cambridge to Bournemooth corridor – none in North Eng or Sco
82% of world’s digital businesses are US and 43% ar Cal. 25 NY, [4% in UK ]
we need: skills, ambition (changing the world), finance, mentors, agility, creativity. [great list]
lack of certainty about copyright is major drawback.
what we need is certainty. [PMR absolutely – one of the key questions].
is fair use an exception to copyright or a fundamental user right? [PMR absolutely key]
now up John Boswell SAS institute  (global industry perspective) . Most data is unstructured
will give examples and outline why changes to copyright are misguided
UN look at social network data from IE and US sentiment analysis. words like “mad” predict unemployment
if u can mine data and predict unemployment can you do something. Tax rev drops and expenditure rises
mining soc security US 15% benefit claims are granted after denial. Gov could predict which so save time money add benefit
US FDA analyse records from doctors for marketed drugs post-release of drugs
Text-mining is NOT copying expression of author, you are analysing
if analyse millions of documents you are not copying single expressions (JB simple example of word frequency
JB advises us that TDM should not be regarded as violation of copyright. We are debating at wrong point in continuum
in purest form TDM does not violate copyright – i.e. we should not have asked for extension. Are we making it worse? [PMR agree]
Staffan Truve. Recorded Future [company]
RF analysing time on the web e.g. “this week” is grounded
250,000 realtime sources 8 languages 10 Billion facts 25 entity types
RF predicted unrest in egypt initially didn’t know how long but later predicted continued unrest
without textmining the web is useless publishers know this. Analysis evoution, new value, aggregate
threats: vertical siols; deep web+darkweb; ip protectionism
no borderline between reading and analysing, cannot differentiate humans from machines, robots must have same rights as humans
@CameronNeylon …
Jean-Fred Fontaine MDC Berlin biomedical literature PubMed has 18 M , 50% have visible literature 0.2 M full texts are TDMable
J-FF showing how we navigate biomedical literature. Gene disease associations
showing enity recog – genes chemicals cooccurrence shows interactions
[PMR my software   (#ami2, svg2xml ) be released next week – watch blogs.ch.cam.ac.uk/pmr/ 10 laptops can mine whole literature]
blogs.ch.cam.ac.uk/pmr/ JFF showing creation of networks to add to data thereby enriching data
JFF shows that TDM is as accurate as a human (but needs FULL-text)
JFF example of disease correlations from Elec Patient Records
uses side effects on drug labels (packet) to deduce targets
JFF whole literature will fit in 250GB. Need full text, figure
next up Dieter Van Uytvanck CLARIN
DVU Research Infrastructure perspective
CLARIN provides access to digital language data (mainly humanities and socsci) and tools
DVU many channels, text, sign language , gestures, neuroimaging,fMRI
CLARIN is global (e.g. Amazon, Bali), Time , rockcarvings, smartphones, experimats contrasted with “in the wild”
examples of data mining in CLARIN. Must have access to whole work. Snippets are not enough
CLARIN replicating experiments is utterly important. Licences do not scale to 500,000 texts collected from websites
CLARIN shows that times of wars generate lots of new words
language analysis with phylogenetic trees for  evolution @rmounce
research infrasttr: longterm preserv, citable, fedrated login, web frontends , know building and support
categories:Public (e.g. gov) Academic (soken dutch) Restricted (doctor patient conversations)
recomm: CC licences, older material as free as can be negotiated; probs w. personal data and ethical (ask ppl if OK with TDM)
Lucie Guibault Legal aspects of text and database: will cover Copyright, Sui generis database right, IPR+TDM
LG Copyright Compilations protected and individual works. Facts, ideas are not protected
compilations protected if selection and arrangements must be author’s own compilation
LG (throwaway rmk) “life of copyright is far too long”
LG unlikely that TDM will fall under allowed Educational and Research use
LG now telling us of sui generis data base rights (only applies to Europe). [PMR I am now getting really depressed.]
PMR v. depressed : Every talk I’ve heard on cpyright and other legal things effectively says “you arent allowed to do anything”.
“TDM not allowed without authorisation from rights holder”. Question, should it?
PMR Why has noone even mentioned Hargeaves at this meeting??? I came for some answers – not getting any
panel starting 5 * 5minutes Natalia Manola Athens OPenAIRE, using TDM to cluster documents in repo.how funding relates 2 outcomes
OpenAire makes agreeemnts with publishers to do TDM
need to work with policyholders at all levels
NM can Liber bring everyone together to make better policy and education about rights?
NM funders have power
NM have to show that many services come from output of mining e.g. Google (and OpenAIRE – I wasn’t sure)
NM too many barriers in Europe for TDM
Nilu Satharasinghe startup in Cambridge
NS rerorutes links to publications on basis of content
NM worried that additional restrictions make it difficult for him to conduct business. Gives examples of mashups (e.g. films)
John McNaught Manchester NaCTeM which supports researchers. must be able to share with many types of researchers. Only Full-text
JMcN text-mining saves lives. [PMR great to hear this]
JMcN highly limited at moment. Funders (e.g. EC) want access to research but what is gov (EC) doing to free up info?
Caroline Dynes (royal Soc) . Opening up data sets. Funder, but also publisher. Pays authors APCs (no details).
CD mentions Hargreaves (first mention in this meeting).
Ellen Broad (IFLA). Most TDM is presumptively illegal without legal exception.
Ellen Broad raises good questions and shows that current position is untenable.
EB must be Certainty. Cannot be openended. Currently no defence against infringement. Investors will forsake EU vs US
JMcN showing how crazy the licencing to universities are
CD announces that Royal Soc agrees “the right to read is the right to mine”. PMR I’ll start tommorrow!
JMcN if Europe wants evidence-based material they should look to TDM and so this must be made legal!
“if there is no european precedent then it isn’t illegal” Hey! let’s get going.
@OKFN very pleased to see that “The Right to Read is the Right to mine”
Kurt Deketelaere at “we must challenge the law” . PMR absolutely agree. That’s how bad laws are obsoleted
Google (?Simon Morrison) you need to create a more flexible approach in Europe (“fair use”)
PaulAyris this is a major issue for universities. Database directives must be revised.
PA TDM should be parts of pilots in Horizon2020 has to test limits of current arrangements

Posted in Uncategorized | Leave a comment

Our Planet's Climate is broken, but copyright stops us reading about it (unless you have 50,000 USD)

The Intergovernmental Panel on Climate Change published the draft of its report last week – http://www.ipcc.ch/report/ar5/wg1/#.UkweFhDB_cg . It’s > 2100 pp and I have downloaded the whole lot and – with the help of my software – will read it. The simple conclusion is that Climate Change (or more drastic, the Breakdown of Climate) is real and supported by scientific evidence.  Many responsible and concerned citizens (at school, in policy making, in business, in planning, in environment… in fact YOU) will rightly want to read about it. And many of you will want, and would be able to understand, the scientific basis. Because we can all be scientists.
The report is based on >9200 references, some from government departments, but many in the scholarly literature. And here’s the problem.
Many of those references are behind a paywall.  I’ve manually followed the first 20 in Chapter 1. [We are planning an OKF Open-Science project to Crowdcraft the whole lot… Please join us http://lists.okfn.org/pipermail/open-science/2013-October/002764.html ] and I list them below. Some are books, though in electronic form, but many are paywalled behind profit and non-profit publishers. My rough estimate is that it would cost at least 100 USD to read the papers in these 20 refs.
There’s nearly 10000 references so lets say it will cost 50, 000 USD for ONE concerned citizen to read the Climate literature. (of course that doesn’t allow reading references in references which is often required.) remember that this is what the UN, through its IPCC thinks are the most critical papers.
Now it would technically be possible for the IPCC to copy these papers and post them. It makes technical sense as many are chapters in larger volumes. But the IPCC can’t do that as it would violate copyright.
Yes, we cannot discuss the planet’s future responsibly because copyright is more important.
I plan to read the 2100 page report by machine and see how it can be made more digitally tractable. For example I’d like to extract diagrams and turn them into tables – this makes it easier to re-run analyzes and make comparisons (anyone interested please let me know).
And I’d like to extract data from those publications in the same way. But the publishers are strenuously trying to stop this. (More on this later). It could  be called copyright theft. And parliament is being lobbied to increase the penalties: http://www.publications.parliament.uk/pa/cm201314/cmselect/cmcumeds/674/67406.htm (“We recommend that the maximum penalty for serious online copyright theft be extended to ten years’ imprisonment. “).
So while our planet is dying we are making sure that it won’t be due to copyright violation.
Here’s the evidence…
Allen, M. R., J. F. B. Mitchell, and P. A. Stott, 2013: Test of a decadal climate forecast. Nature Geoscience, 6, 243-244. [18 USD, PAYWALL http://www.nature.com/ngeo/journal/v6/n4/full/ngeo1788.html?WT.ec_id=NGEO-201304]
 
AMAP, 2009: Summary – The Greenland Ice Sheet in a Changing Climate: Snow, Water, Ice and Permafrost in the Arctic (SWIPA). Arctic Monitoring and Assessment Programme (AMAP), 22. [FREE, http://www.amap.no/documents/doc/arctic-climate-issues-2011-changes-in-arctic-snow-water-ice-and-permafrost/129]
 
Armour, K. C., I. Eisenman, E. Blanchard-Wrigglesworth, K. E. McCusker, and C. M. Bitz, 2011: The reversibility of sea ice loss in a state-of-the-art climate model. Geophysical Research Letters, 38. [Corrupted Ref, PAYWALL http://onlinelibrary.wiley.com/doi/10.1029/2011GL048739/pdf]
 
Arrhenius, S., 1896: On the influence of carbonic acid in the air upon the temperature of the ground. Philos. Mag., 41, 237–276. [FREE at http://www.rsc.org/images/Arrhenius1896_tcm18-173546.pdf]
 
Baede, A. P. M., E. Ahlonsou, Y. Ding, and D. Schimel, 2001: The Climate System: an Overview. Climate Change 2001: The Scientific Basis. Contribution of Working Group I to the Third Assessment Report of the
Intergovernmental Panel on Climate Change, Cambridge University Press. [PAYWALL? Google Books omits pages ]
 
Beerling, D. J., and D. L. Royer, 2011: Convergent Cenozoic CO2 history. Nature Geoscience, 4, 418-420. . [18 USD, PAYWALL]
 
Bretherton, F. P., K. Bryan, and J. D. Woodes, 1990: Time-Dependent Greenhouse-Gas-Induced Climate Change. Climate Change: The IPCC Scientific Assessment Cambridge University Press, 410. [UNAVAILABLE]
 
Brönnimann, S., T. Ewen, J. Luterbacher, H. F. Diaz, R. S. Stolarski, and U. Neu, 2008: A focus on climate during the past 100 years. Climate Variability and Extremes during the Past 100 Years,  S. Brönnimann, J.  Luterbacher, T. Ewen, H. F. Diaz, R. S. Stolarski, and U. Neu, Eds., Springer, 1-25. [PAYWALL 140 GBP]
 
Broomell, S., and D. Budescu, 2009: Why Are Experts Correlated? Decomposing Correlations Between Judges.
Psychometrika, 74, 531-553. [PAYWALL, 40 USD http://link.springer.com/article/10.1007%2Fs11336-009-9118-z]
 
Brunet, M., and P. Jones, 2011: Data rescue initiatives: bringing historical climate data into the 21st century. Climate Research, 47, 29-40. [FREE, http://www.int-res.com/articles/cr_oa/c047p029.pdf]
 
Budescu, D., S. Broomell, and H.-H. Por, 2009: Improving communication of uncertainty in the reports of the
Intergovernmental Panel on Climate Change. Psychological Sci., 20, 299-308. [UNAVAILABLE]
 
Byrne, R., S. Mecking, R. Feely, and X. Liu, 2010: Direct observations of basin-wide acidification of the North Pacific  Ocean. Geophysical Research Letters, 37. [FREE, ftp://soest.hawaii.edu/coastal/Climate%20Articles/Acidification%20Pacific%20Byrne%202010.pdf]
 
CCSP, 2009: Best Practice Approaches for Characterizing, Communicating, and Incorporating Scientific Uncertainty in Climate Decision Making. U.S. Climate Change Science Program. , 96 pp. [FREE http://www.climatescience.gov/Library/sap/sap5-2/final-report/sap5-2-final-report-all.pdf]
 
Church, J. A., and N. J. White, 2011: Sea-Level Rise from the Late 19th to the Early 21st Century. Surveys in
Geophysics, 32, 585-602. [PRINT BOOK, ca 35 GBP]
 
Church, J. A., J. M. Gregory, N. J. White, S. M. Platten, and J. X. Mitrovica, 2011: Understanding and Projecting Sea Level Change. Oceanography, 24, 130-143. [FREE http://www.tos.org/oceanography/archive/24-2_church.pdf]
 
Cleveland, W. S., 1979: Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association, 74, 829-836. [JSTOR, FREE? copy at http://www.people.fas.harvard.edu/~gov2000/Handouts/lowess.pdf]
 
Collins, M., and M. R. Allen, 2002: Assessing the relative roles of initial and boundary conditions in interannual to decadal climate predictability. Journal of Climate, 15, 3104-3109. [FREE, http://journals.ametsoc.org/doi/pdf/10.1175/1520-0442%282002%29015%3C3104%3AATRROI%3E2.0.CO%3B2 ]
 
Covey, C., et al., 2003: An overview of results from the Coupled Model Intercomparison Project. Global and Planetary Change, 37, 103-133. [FREE at http://www.meteo.psu.edu/holocene/public_html/shared/articles/Coveyetal-GlobPlanChng03.pdf ]
 
 
Dee, D. P., et al., 2011: The ERA-Interim reanalysis: configuration and performance of the data assimilation system. Quarterly Journal of the Royal Meteorological Society, 137, 553-597. [FREE http://onlinelibrary.wiley.com/doi/10.1002/qj.828/pdf]
 
Dlugokencky, E. J., et al., 2009: Observational constraints on recent increases in the atmospheric CH4 burden.
Geophysical Research Letters, 36, L18803. [ FREE http://biodav.atmos.colostate.edu/kraus/Papers/Methane/methane/CH4%20trend/2009_CH4trend.pdf]

Posted in Uncategorized | 6 Comments

@Jay_Naidoo @Okcon "use the tool that you have to fight for justice and ethics"

Listening to @Jay_Naidoo’s plenary on Wednesday at #okcon I had a revelation.
What matters is Justice.
Jay’s twitter describes him as  Chairman of the Global Alliance for Improved Nutrition(GAIN), former Minister in Mandela Cabinet, founding GS of COSATU and political and social activist. Jay gave  us 30 minutes of breathtaking passion about what we must do for the world. I tweeted it and have recaptured my tweets (below).  Jay urged us to action – at the end Andrew Stott (chair) asked
“what is most import thing for us to do in the next week?”
Jay:   “use the tool that YOU have to bring justice and ethics”
That has changed the rest of my life. The key word is JUSTICE. That’s what the OKF is about. That’s what my software is about. That’s what this blog is about. And it was the theme running through the whole of #okcon. We develop tools (blogs, protocols, Python, Java, CKAN, Open Spending, Panton Principles, etc.) to make the world a just society.
Jay articulated Nelson Mandela’s passion on stage – it was as if it was Mandela speaking. If you were not at the session is is the must-watch 30 minutes of #okcon – and watch it all.
Here are my tweets from Jay – the order doesn’t matter. Read them and then go out and fight for justice.
when you are poor the only food is junk food.
“one person one vote” => “one person one gigabyte” (of information)
we can now hold leaders to account. Citizenry must be informed and make power accountable
“we have liberated ourselves from the chains of secrecy”. Where is the money for the library and toilets?
when people know the truth and their rights they become unstoppable. Everyone of us is a journalist and whistleblower
give us (digital) tools for accountability.
SA has second most transparent budget in world. We have to make it understandable. Create a revolution
can we translate understanding of budget into local funding to deliver schools, etc.
SA delivered a political miracle .
old generation cannot tell young what they should do but must support them.
Andrew Stott asks  how do we make this happen.  we must give young people a voice . innovation to deliver a better society
Have to defeat language of denialism . We must be advocates of building society where we care about what happens to planet
money that belongs to our people (in S A) is stolen. Governance should be brought to ordinary people through Open Data
We must bring back ethics and accountability
How do we make our democracy work. How create livelihoods. S Africa now lags behind Kenya.
[telcoms in Africa]. Must design this (from bankrupt beginnings) to provide justice
technology has led to death of geography but not death of injustice
Live a life of truth. Undiluted truth. Then you will challenge injustice wherever it stands
how to we bring compassion into the cold steel of technology. Our technology is built from blood spilt in the Congo
[recalls Mandela] – “fighting poverty is not an act of charity, it is an act of justice”.
stand up and do something
Build a tsunami of hope and accountability – your job is to be accountable and serve society
“Overflowing of positive energy that makes us want to be better people”
tackle global malnutrition. Billion people will go to bed today without food
we have brought the world to its knees. What could we have done with the wasted financial millions. The problem remains
You must be brave like Steve Biko. Nothing to lose but your chains.
 
 

Posted in Uncategorized | Leave a comment

#okcon (first) thoughts (blog, hack)+ in Geneva Public Library

My blogging comes in fits and start. I sometimes used to feel upset if I didn’t blog each day. But now I two major imperatives – to blog and to hack Liberation Software. And for the last few weeks the software has been on top.
Also I have had technical problems. I have been using Word to author blogs since (a) I didn’t like WordPress interface (b) I can only use it online (unless you tell me different) and (c) I spent some time experimenting with Word’s voice recognition software (d) it was more convenient to use Word to create compound documents with included images. (c) no longer holds. (d) started to fail badly and doesn’t seem to have a cure. Word/Wordpress gave useless error messages, failed to upload, created multiple posts with same title, etc.
So I am coming back to using WordPress. I think it’s clunky but I have no alternative. It means I can only blog at certain times of the day and have to spend extra time uploading the images. But I have to do it.

Wow! I can copy. Maybe it’s not so bad. (BTW this is Chuff, the OKF okapi).  Animals are not allowed in libraries so he/she/it is sleeping in the hotel. I love public libraries – they have a sense of calm, quiet but also centuries of history of the struggle for freedom. See http://switzerland-geneva.com/attractions/library.html . They have free wifi. And there’s a student cafe in the next building.
My mind has been blown at #okcon. I need to blog on at least the following:

  • Mat Todd’s fantastic, world changing, session yesterday on Open Source Drug Discovery. The world has a crisis in dscovering new pharmaceuticals and, IMO, Open Knowledge and collaboration is a critical part. Without it we shall not defeat Neglected Tropical Diseases (NTD) or antibiotic resistance. We have to change the way we work.
  • Jay Naidoo’s completely inspirational talk yesterday. He is a colleagues of Mandela and brought the same massive message. (“South Africa was the political/democratic miracle of the twentieth century”). Jay brought me one word – JUSTICE – which is transforming the way I now see Open Knowledge. That is what we are about.
  • OKF itself. We are now a major resource for bettering the world. Some years ago I started telling people “Wikipedia is the digital triumph of the 2000’s. OKFN will be the triumph of the 2010s.” (I don’t think I ever wrote this). I now believe it.
  • Chuff. Down! Your time will come. In Berlin 2014. (This shows I can do strikethroughs).

But I also need to hack. Content-mining will be massive and I am making a contribution through PDF-hacking (#ami2). I’m now very close to nearly faultless conversion of BioMedCentral PDFs to semantic XML. Ross Mounce, Matt Wills and I will be starting on this in earnest next month for phylogenetic trees. It’s been desperately hard work and it’s really only because I don’t have a day-job I can give it the obsessiveness it needs. But #ami2 can do simple tables (nobody can do complex tables because there are no semantics describing them). #ami2 can do diagrams if the EPS strokes and characters are still there. I’ve done a complex phylo tree and am pleased with progress. (I’ll blog all this later).
So I’m going into blog-hack-blog-hack mode… (blog, hack)+ in regex-speak. The hacking takes precedence.
So maybe WordPress is now easier to use than it used to be – we’ll see. I might even try SVG later.
 

Posted in Uncategorized | 1 Comment

Massively Multiplayer Online Bibliography contrasted with Elsevier’s Mendeley

There’s been a minor Twitter storm caused by one of my tweets about MMOB and Elsevier’s Mendeley. Twitter is a poor medium for discussion (asynchronous and character-limited) and since I had intended to blog about these issues and this is a useful time.

I regard machine-readable Bibliography as “the map of scholarship”, detailing who published what when and for what reason. I have a serious criticism of academia in that they haven’t built an Open Bibliography. There is no bibliography of scholarly publications, partly because there are (very imperfect) commercial offerings and libraries prefer to buy commercial products than create and publish their own.

3 years ago I ran a JISC-funded project on “Open Bibliography” involving Cambridge University and the Open Knowledge Foundation. (http://www.jisc.ac.uk/whatwedo/programmes/inf11/jiscexpo/jiscopenbib.aspx ). We built much of the metadata structure for OB including the BibJSON protocol and worked with libraries (BL, CUL) to make their monograph collections Open.

Unfortunately we cannot easily do the same for journal articles. Publishers do not make their bibliography openly available or allow re-use of it. Many such as Elsevier expressly forbid the compilation of an index of “their content”. Although it’s technically easy to do (and we have software for this, PubCrawler) the publishers are the problem.

To tackle this problem of missing bibliographies Wikimedia has launched Massively-Multiplayer Online Bibliography MMOB (http://meta.wikimedia.org/wiki/Massively-Multiplayer_Online_Bibliography ). Their goal is:

a series of crowdsourcing projects to perform significant feats of online bibliography in a fun, collaborative, and principled way, that would be useful to everyone and acceptable to professionals. It will rely on volunteer labor, free software, and open Web standards.

It is run as a bottom-up collaborative community project and I will certainly hope to get involved. So I tweeted:

 

Wikimedia starts MMOB, get Involved

This was commented by William Gunn @mrgunn (whom I know) of Elsevier’s Mendeley

mrgunn @mrgunn
@openscience
@petermurrayrust Very Interesting! #openbibliography – I like to think of Mendeley as a sort of MMOB.

And since I do not like to think of Mendeley as equivalent to a MMOB I replied

@mrgunn
@openscience It’s not. It used customers without consultation to build resource which belongs to a monopolistic owner

This led to a series of Twitter exchanges and rather than repeat them I shall blog my position and offer blog-comments for anyone to reply. First you need to understand Mendeley. To avoid bias I shall refer you to http://en.wikipedia.org/wiki/Mendeley. In August 2008 ago Mendeley launched. It offered a useful SoftwareAsAService to manage personal reference lists (bibliographies) for scientists. From Current Mendeley website (http://www.mendeley.com/ ) it is:

a free reference manager and academic social network that can help you organize your research, collaborate with others online, and discover the latest research.

  • Automatically generate bibliographies
  • Collaborate easily with other researchers online
  • Easily import papers from other research software
  • Find relevant papers based on what you’re reading
  • Access your papers from anywhere online
  • Read papers on the go, with our new iPhone app

And from Wikipedia:

Mendeley requires the user to store all basic citation data on its servers—storing copies of documents is at the user’s discretion.

WP also comments (I think accurately):

Elsevier purchased Mendeley in 2013.[10] The sale caused quite a fuss on scientific networks and in the media interested in Open Access,[11] and upset some Mendeley users who felt that the program’s acquisition by publishing giant Elsevier, known for implementing restrictive publishing practices and provoking scandals,[12] was antithetical to the open sharing model of Mendeley.[13]

This is a primary concern. Wherever you read Mendeley you should add “Elsevier”. I assume the following:

  • That Elsevier has a controlling interest in Mendeley
  • That Elsevier could close Mendeley down, close some or all of its current services, add new services
  • Elsevier could unilaterally change Terms and Conditions
  • That the user “community” has no say in governance

These make it completely different from the principles and practice of Wikipedia’s MMOB.

I was neutral about Mendeley before it was purchased. I thought the idea was clever, and it worked and it provided a useful service. I was concerned that Mendeley appeared to have aggregated a huge number of full-texts of the articles with no indication whether this was legitimate. I believed that users regarded Mendeley as a useful reference managed and (though I don’t know) a useful social site. They did not expect to have a voice, and they probably were not too concerned about giving their personal data to a commercial organization. I would not have referred to Mendeley before the purchase as “monopolistic”.

I doubt Elsevier bought Mendeley for the revenue it generates looking to a wide value. This includes:

  • A very large collection of online scientists
  • An effective way of creating a bibliography for Elsevier (though this has not been explicit)
  • An effective way of collecting scientific articles (including from competitors). The ToC says that these must be legal but I doubt there is a transparent audit

This is a very powerful resource. If you can steer the way scientists behave, and at the same time generating value from them (they admit to anonymised data analytics) you have something of considerable value. If you then use it as a way of routing Elsevier ideas and products to these scientist you have a lot more. So I stand by my tweet:

  • Elsevier is effectively monopolistic and with the purchase of Mendeley has a quasi monopolistic position in bibliography
  • Users were not consulted about the direction of Mendeley’s development (e.g. being bought by Elsevier)
  • Users may now be concerned about Elsevier possessing all Mendeley’s content.

The potential uses of Mendeley data within Elsevier are massive. Even anonymised (and there is no independent audit) the analytics can be massively valuable. Which papes are used? By whom? For what? This allows Elsevier to decide what new products to create, what journals and how to price products.

The major message is that the academic community must build its own information infrastructure. Libraries have sleepwalked into buying products from publishers which can be used to control scholarship. We must change that.

Posted in Uncategorized | 16 Comments

A single #openaccess reprint request from Elsevier costs 50,000 USD

Elsevier pointed me to their recent set of #openaccess journals. I have uncovered multiple serious problems in their labelling and access. Now I show that mislabelling and misimplementation could cost customers huge amounts of money. I will pretend I am a manufacturer of heating systems and that I want to reproduce a scientific paper to help sell my new revolutionary systems. I need the following 8-page paper.

http://www.sciencedirect.com/science/article/pii/S0306261913001803

Heat load patterns in district heating substations

Open Access Article 

Now I can’t just use it without permission. Although it says “Open Access” it might be restricted to non-commercial use (CC-NC). So I go to Permissions and Reprints. I want 1000 glossy copies for my trade show.

You can do this yourself, 1000 copies of full article, print and electronic, in United Kingdom, translated into 2 languages, for commercial company costs 51,05138 USD

So I am to be charged FIFTY THOUSAND DOLLARS for reusing this article. (BTW if I wanted 10000 it would have cost 10 times that. And If I were a pharma company wanting 10000 copied it would have cost THREE QUARTERS OF A MILLION DOLLARS (750,000).

This illustrates several points:

  • If the paper had, in fact , been CC-BY it wouldn’t have cost a cent
  • Elsevier re-use charges are unbelievably huge

And that anyone clamouring for CC-NC (Rosie Redfield, Heather Morrison) is simply feeding huge amounts of money to Elsevier.

Posted in Uncategorized | Leave a comment

Update from Elsevier; some #openaccess problems will take many months to fix

Today I got a second substantive reply from Elsevier on the problems with their #openaccess labelling and access.

The other issue you raise relates to the clarity of labelling of articles as open access, and the clarity of labelling the license terms under which these articles are available. Over the past year, we have been working to upgrade our core metadata management systems to ensure that we have a central, consistent overview of all of our open access content. These developments are now complete and in coming months, we will be working to integrate the new metadata across our large number of platforms and back office systems.

ScienceDirect will be the first product to benefit from the new metadata, and with the next release (coming soon) there will be improvements in how open access content is labelled on ScienceDirect. In fully open access journals a “Open Access” text label will appear below the journal title on articles. In hybrid open access journals this “Open Access” text label will appear below the article title. For the licensing terms, as you know, we have offered authors a choice of licenses since April 2013. For these articles our production systems have the relevant metadata to signal which license (e.g. CC-BY, CC-BY-NC-SA, CC-BY-NC-ND) applies to the article and this information will be more visible in the next release. We are investigating how to populate this license metadata information for older open access articles.

Roll out to our products and services will continue through the remainder or 2013 and into 2014. While this is very much a work in progress, it is one we take very seriously indeed and are determined to get right.

With very kind wishes,

Alicia Wise, Director of Universal Access Elsevier @wisealic

To remind readers, after being pointed by Elsevier to their Open Access showcase I uncovered a multitude of problems (/pmr/2013/08/15/update-on-elseviers-failings-in-gold-openaccess/ ), many of which could (and I think do) result in substandard offerings and overcharging (I shall illustrate this in the next blog post). Elsevier replied to one of my concerns (/pmr/2013/08/15/elsevier-replies-in-part-systems-issues-are-inevitable/ ) saying “systems issues are inevitable” and indicating that there were at least three problems and they would reply to each of these. They then replied again, today (above). At least two of the problems I highlighted have not been corrected and it appears from the letter they will not be for some time.

This is not PMR “nitpicking”. This can, and if not fixed immediately, will cost people thousands of dollars.

Two examples:

  • An author submits paper which is required by the funder to be CC-BY. It is mislabelled (e.g. “Elsevier, All rights reserved), or hidden behind a paywall and the granting body would reasonably withhold grant from the scholar. Their only recourse would be to trust that Elsevier had the paperwork intact and also would respond rapidly and professionally.
  • A re-user wishes to re-use an article for commercial purposes and if the article were mislabelled as CC-NC or not labelled could be charged thousands of USD (yes, I will show this in the next post).
  • A re-user believes that an article is CC-BY, re-uses and is contacted by Elsevier’s lawyers.

None of these are fanciful. At best they cost time at worst they cost money. I re-emphasize that this will, apparently takes months and the total cost to Elsevier customers could be large.

I’ll finish with a comment from Gavin Simpson on this blog (/pmr/2013/08/15/update-on-elseviers-failings-in-gold-openaccess/#comment-141062 )

Alice,

It *is* worrisome that in an *entirely* open access journal that your web teams and associated staff just don’t seem to get this right from the outset. I can understand the metadata backend argument from your later reply and the desire to implement a more robust system that will work efficiently for the future. What beggars belief is that you can build a website for an *entirely* open access journal that even includes code/logic that might display a message about charging for access! Likewise, how difficult it is to remove the “All rights reserved” label – which I note is *still* there despite the article in question no longer suggesting a charge is levied for access? This points to a large failing within Elsevier among its broader staff to understand OA and licensing etc. I’m sure you and some of your colleagues are all over this, but is it filtering down to the people that also need to know enough not to do stupid things like slap “All rights reserved” on OA journal websites? You and the small group of OA people at Elsevier can’t possibly monitor everything the company does with regards to OA and licensing, so what is being done to educate the wider company staff of the issues? It’s not as if these things don’t keep on happening and they *do* affect how we perceive Elsevier and undermine what positive steps the company is trying to make in this area.

 

Posted in Uncategorized | Leave a comment