Nature News reports SCIgen gibberish papers; can we rely on conventional peer-review? Or can machines help?

Richard van Noorden has an important report
http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763
Two science publishers have withdrawn more than 120 papers after a researcher in France identified them as computer-generated. According to Nature News, 16 fraudulent papers appeared in publications from Germany-based Springer, and more than 100 were published by the New York-based Institute of Electrical and Electronic Engineers (IEEE).
It’s not clear what the motive was – academic fraud? or a Sokal/Bohannon-like demo of the frailty of peer-review? But the immediate effect is to show that a large number of “peer-reviewed” scientific papers have flaws.
This should surprise no-one who understands the process of scientific publication. I will assert that, in principle, every published article has flaws. Most will be minor – typos in references or mislabelled diagrams or typos in tables or misdrawn chemical diagrams or countless other errors.
Consider a doctoral thesis – possibly  the most intensively peer-reviewed document that a scientist produces. The thesis is written knowing that failure may be absolute – a career could depend on it. It has taken months to prepare. Almost always the student has to revise it for “minor errors”. (My own thesis had a number and yet I have asked for it to be digitised at Oxford). Errors are ubiquitous.
There are roughly three absolute reviewers of scientific material:

  • The natural and physical world. Nature (not the journal) always wins. It is fair – God does not play dice – but neither does s/he tolerate errors. This is the ultimate arbiter. One of the strucures in my thesis was “wrong”. I discovered later that it was in a subgroup (Fd3) of the reported space group (Fd3m). This wasn’t trivial – it included a rare sort of twinning (which has given me minor eponymity) This is how science progresses. Science is a series of snapshots.
  • The computer.  It doesn’t lie. If you don’t get the same answer as someone else then either you or they or both have to find out where the problem is. It’s interesting that most of these fake papers were in the area of Computer Science. Properly reported CS should be very difficult to fake. Unfortunately much of it is very badly reported.
  • Humans. Human judgment is variable and changes with time. A “good” paper noes may be “bad” at a later stage and vice versa. An “exciting” one now may be shown to be uninteresting later or vice versa.  Science often changes by paradigm shifts and many of those were rejected when first published. Moving continents? ulcerating bacteria? charged species in solution? Examples of science that would have led to dismissal for lack of  “impact”

The rush for immediate impact is anti-scientific as is the rush for multiple publications.
I doubt this will change.
But one thing that can help to reduce noise, error, fraud, duplication etc is the use of machines.
Machines can detect fraud (I shall show how shortly). Machines can detect errors – we have already shown this. Machines can reproduce (or fail to reproduce) computational science.  This could and should be done.
 
The problem is that it is a lot of work to set up the proper apparatus. And publishers don’t like that (I expect a few shining examples such as IUCr/Acta Crystallographica). It costs money to verify and check science. That eats into profits. And while publishers get paid for the number of papers they publish (and generally not the ones they reject) why bother?
Why do chemistry publishers not insist on machine readable spectra. It’s trivial.
Why do they not insist on machine readable chemical structures? That’s even more trivial.
Because it costs effort?
And worse – it means that the scientific literature becomes a semantic database. And that would never do, because it could replace the secondary databases that generate hundreds of millions of dollar income.
I and my friends could have all the tools to create higher quality chemistry, less fraud, more value. And that goes for many other sciences.
Machines can help authors… I’ve tried that for over 10 years. No progress.
Will the culture of publication change in my lifetime??
That’s up to you.

Posted in Uncategorized | 1 Comment

MDPI and Beall – further comments from a "brainwashed Brit"

After my recent post on MDPI there has been a flurry of comments on this blog and I have also received a few private mails. Some are accusatory either of me or other correspondents.
To clarify my position:

  • I have been aware of MDPI for ca 16 years and have no indication that they are other than a reputable scientific publisher. I have 2-3 times corresponded  with them.
  • I wrote “I have no personal involvement with MDPI”. This was poorly phrased – I mean to say I have no financial interest in MDPI nor am I involved in any way in the running of the company.
  • A month ago I accepted an invitation to be on the editorial board of the journal Data. I approve of what Data is setting out to do and I intend to take an active interest – making comments and suggestions where appropriate. I do not approve of editorial boards who simply provide names.  I intended to announce my membership on this blog.
  • I have been invited to contribute an article to a special issue edited by Bjoern Brembs and continue to do so.

I have worked extensively on material in the 3 journals Molecules, Materials and Metabolites because it is well presented and I believe it to be honest science. This does not involve MDPI, although I have told them what I am doing.
=======
I note that there are a great number of accusations about what various people have been doing, some implying fraud or near-criminal activity. I know nothing more of these (that is what the phrase “no personal involvement” was intended to address.) I do not intend to try to find out more about these. I shall not respond to them and may decline to post some of them.

  • I shall continue to mine the content from MDPI journals and publish the resulting science. I can do this with or without the cooperation of MDPI. I shall report the science objectively.
  • I shall continue to be an active member of the board of Data.

=======
I remark that the scholarly publishing industry has a turnover of ca 10-15 Billion dollars. Profit margins are very high. I am not surprised that there are low quality journals. Elsevier’s “Fractals, Solitons and Chaos” is a case in point (see Wikipedia for objective analysis). How many libraries have bought that? What chccks are there on quality? none.
I have argued for many years that Open Access needs a regulatory organ and been generally shouted down. The OA community is now reaping the harvest of its lack of care in standards – the (mis)label “Open Access” costs far more dollars than marginal publishers. Had the OA community created a system whereby MDPI or any other publisher could get formally certified they would not need to be have to defend themselves.
No good can come from single people who set themselves up as self-appointed arbiters, be they Beall or Harnad.  Criticising single articles (as Retraction Watch does or the chemical blogosphere) is admirable – especially as the discussion is open and different points of view are accepted. However Beall writes:
This post is a good example of how Brits in particular and Western Europeans in general have been brainwashed into thinking that individuals should not make any assertions and that any statements, pronouncements, etc. must come from a committee, council, board, or the like. This suppression of individuality is emblematic of the intellectual decline of Western Europe. This suppression is laying the foundation for the erosion of individual rights in Europe and the forced imposition of groupthink throughout the continent.
This immediately shows Beall’s total lack of objectivity. He gave an indication earlier with a white paper effectively attacking Open Access as a capitalist plot (or an anti-capitalist one – I couldn’t work out which). My nationality is irrelevant. Beall’s language verges on the nationalist – the nationality of the proprietor of MDPI (Chinese) is irrelevant for me – the question is does s/he run and host an effective operation.
Murray-Rust’s statement “I have no personal involvement with MDPI” is not reflective of the facts. Indeed, he is listed as serving on the editorial board of one of MDPI’s many (empty) journals, the journal Data. See: http://www.mdpi.com/journal/data/editors (Peter, if you did not know that you were listed here, please let me know, because this is a common practice, adding people to editorial boards without their permission. Otherwise, please explain your statement that you lack involvement with MDPI.)
I have explained this above
It would be great if SPARC were to list predatory publishers and journals, but it and most OA organizations pretend that predatory publishers don’t exist because they are afraid to admit that their OA fantasies are … just fantasies. OASPA’s membership list functions as sort of a white list, so if you don’t like my list, use OASPA.
The word “fantasy” immediately removes any chance or rational discourse.
MDPI is becoming an increasingly controversial publisher. This controversy will rub off on authors who publish there, and in the long run, I think most will wish they had published in a higher quality venue. Authors should make decisions as individuals (while they still can) and do what’s best for themselves as researchers. I am saying that for most individual researchers, MDPI is not a good choice, and you ought to consider a better-quality venue.
“controversial” is a subjective term and irrelevant. It is possible to whip up opinion against an organisation and, where the organisation depends on trust, this can be very difficult to refute. Beall has built a  list of publishers of questionable ethics and practices. Initially I felt it was useful, though I disliked the word “predatory” as it applies to many closed access publishers – they just use different tactics. I now have no regard for Beall’s list which I consist consists of personal prejudices (some of them nationalist).
I shall not write more on this topic. I shall write on Data and I shall write on content extraction.
 
 

Posted in Uncategorized | 4 Comments

Content Mining Myth Busting 0: "It doesn't matter to me"

In the next few posts I shall address some common myths about Content Mining (TDM). Many are implicitly or explicitly put to by Toll-Access Publishers (TAPublishers).
The most serious myth is that it’s not important.
Actually it’s important to everyone. The two major information successes of the first decade of this century were both content-mining:

  • Google has systematically mined the Open Web using machines and added its own semantics
  • Wikipedia has systematically mined the info sphere using humans and added its wn semantics.

If you have ever used Google or ever used Wikipedia then you have used the results of content-mining.
Wikipedia is beyond criticism – if you are unhappy about it, get involved and change it. But what about Google.?
Well Google doesn’t do science.
If I want to know what species was recorded in this place at that date; or what chemical reaction occurred under these conditions, then Google doesn’t help. You need a semantic scientific search engine.
Discipline-based Semantic content mining  is the most important development in applied information science. If you want to build the library of the future you should be doing this – not paying rent to third parties. If you want to do multidisciplinary research you need the results of content-mining.
If we were allowed to do it, then I wouldn’t be wring this blog post. As it is, the TAPublishers are fighting tooth-and-nail to stop us content-mining.  People are doing it but in secret. Because if they do it in public, then they will be cut off or sued. It’s not surprising that we don’t yet have  high visibility.
But that’s going to change. And change rapidly. We have literally billions of dollars of information locked up in the current scholarly literature. And 10000 papers come out each day. We need content mining to manage these – read them for us. Organize them. Let us search after we’ve read them. Do some of our routine thinking for us.
On our own terms for our own needs.
It can happen, just as Wikipedia happened.
So don’t turn away – believe that Content Mining matters – matters massively.
 

Posted in Uncategorized | 4 Comments

All our software is Open Source; our Data is Open and our standards are Open

Several commenters have asked whether the software we write is Open?
YES
ALL OF IT
UPDATED DAILY
All our software is aggressively Open Source or Free, written with a primary purpose of making information universally free. I call it
LIBERATION SOFTWARE.
Some years ago a number of us met in San Diego under the larger Blue Obelisk in Horton Plaza and decided to promote our software as
OPEN
INTEROPERABLE
and came up with the mantra
ODOSOS = Open Data Open Standards Open Source
This has been very successful (see http://www.blueobelisk.org) and we are continuing to bring in new groups.
Our own group has produced:
OSCAR2 – data checking (Chris Waudby, Joe Towsend et al)
OSCAR4 chemical entity recognition (Peter Corbett, David Jessop, Lezan Hawizy)
OPSIN name to structure (Daniel Lowe)
CHEMICAL_TAGGER chemical phrase interpretaion (Lezan Hawizy, Nico Adams, Hannah Barjat)
EUCLID/CMLXOM/JUMBO/ Chemical Markup Language (PM-R et al)
SVG PDF2SVG SVG2XML HTML (PMR , Murray Jensen) interpreting PDFs
SVG2XML PMR more interpreting PDFs
XMTML2STM + FooVisitors (phylo, chemistry( PMR Mark Williamson, Andy Howlett)
IMAGEANALYSIS PMR
CRAWLER and REPO PMR+Mark Williamson)
So yes, there’s lots to build on
PLEASE JOIN IN
 
 
SVG-DEV/HTML-DEV
 

Posted in Uncategorized | Leave a comment

Why do libraries sign contracts forbidding mining? I ask under FOI and request them to stop

I intend to submit the following Freedom Of Information request to the 26 leading UK universities (“Russell Group”). The excellent http://whatdotheyknow.com makes this very easy as it gives the addresses and actually sends the request.  The Universities have to answer within 20 working days (most manage it in 19.9 days so don’t hold your breath).
I ask whether any University has any policy on supporting researchers to carry out content-mining (Text and data Mining, TDM). Most universities seem to accede to any conditions laid down by publishers. This is strengthened by the total lack of any reaction to Elsevier’s recent “click through” licence. It’s easy to get the impression that universities don’t care. Maybe this request will show they have been secretly fighting for us – who knows?
I’d be very grateful for comments ASAP. I will try to summarise answers and would certainly appreciate help here.
========================= Dear University ====================
Background and terminology:
This request relates to content mining (aka Text And Data Mining (TDM), or data analytics) of scholarly articles provided by publishers under a subscription model. Mining is the use of machines (software) to systematically traverse(crawl, spider) subscribed content, index it and extract parts of the content, especially facts. This process (abstracting) has been carried out by scholars (“researchers”) for many decades without controversy; what is new is the use of machines to add speed and quality.
Most subscribers (universties, libraries) sign contracts provided by the publishers. Many of these contain clauses specifically restricting or forbidding mining (“restrictive contracts”). Recently the UK government (through the Intellectual Property Office and professor Hargreaves) recommended reform of Copyright to allow mining; a statutory instrument is expected in 2014-04. Many subscription publishers (e.g. Elsevier) have challenged this (e.g. in Licences 4 Europe discussions) and intend to offer bespoke licences to individual researchers (“click-through licences”).
In many universties contracts are negotiated by the University Library (“library”) who agree the terms and conditions (T&C) of the contract. At the request of the publishers some or all of the contract is kept secret.
Oversight of library activities in universities usually involves “library committee” with a significant number of academics or other non-library members.
Questions (please give documentary evidence such as library committee minutes or correspondence with publishers):
* How many subscription publishers have requested the university to sign a restrictive contract (if over 20 write “> 20”)?
* When was the first year that the University signed such a contract?
* How often has the university challenged a restrictive contract?
* How many challenges have resulted in removal of ALL restrictions on mining?
* Has the university ever raised restrictions on mining with a library committee or other committee?
* How many researchers have approached the university to request mining? How many were rejected?
* How often has the university negotiated with a publisher for a specific research project? Has the publisher imposed any conditions on the type or extent of the research? Has the publisher imposed conditions on how the research can be published?
* How often has an researcher carried out mining and caused an unfavourable response from a publisher (such as removal of service or a legal letter)?
* How often has the university advised a researcher that they should desist from mining? Have any researchers been disciplined for mining or had subscription access removed?
* Does the university have a policy on researchers signing “click through licences”?
* Does the university have a policy for facilitating researchers to carry out mining after the UK statutory instrument is confirmed?
* Does the university intend to refuse to sign restrictive contracts after the statutory instrument comes into force?
 
Your immediate comments will be very valuable asa I shall start sending these out very soon.

Posted in Uncategorized | 3 Comments

Beall's criticism of MDPI lacks evidence and is irresponsible

I have just seen Jeffrey Beall’s “analysis” of MDPI http://scholarlyoa.com/2014/02/18/chinese-publishner-mdpi-added-to-list-of-questionable-publishers/#more-3072 and wish to respond immediately.
I will not respond to all Beall’s criticisms.
Beall has set up a site where he lists questionable (aka predatory) Open Access publishers who have poor or non-existent quality controls or have questionable organisations. This is potentially a useful service, though it is inappropriate that it should be done by a single person, especially one lacking discipline knowledge.
I have no personal involvement with MDPI. I remember when they started as a company which actually took physical chemical samples and stored them so that people could check later (the acronym MDPI can also stand for Molecular Diversity Preservation International). The compounds were linked to a journal, “Molecules” with full text. It has been going for 17 years. At one stage I wrote to them and asked them to change the licence from CC-NC to CC-BY and they immediately did.
I have never had any reason to doubt the validity of Molecules. I am now using it as an Open Access source of material to data-mine. We are doing the same with “Materials” and “Metabolites”.
Beall’s criticism that these are “one-word” titles is ridiculous and incompetent. They are accurate titles.
I have read (as a human) hundreds of articles in these publications. If I were to review a paper in any of them I would assume it was a reasonably competent, relatively boring, moderately useful contribution to science. The backbone of knowledge. I would expect to find errors, as I would in any paper. I reported one in my last post. This wasn’t fraud, it was a product of the awful state of ALL scholarly publishing where paper processes breed errors.
It is right that there should be a list of irresponsible journals and publishers. It should be run by an Open organisation, not Beall. Maybe OASPA? Maybe SPARC? I don’t know. It is wrong that a single person can destroy a publisher’s reputation.
It is also right that we should highlight the equally awful (if not worse) practices of closed access publishers. Why is there no organisation campaigning for reader rights? It seems to fall to me, an individual.
All publishers have junk articles and fraudulent articles. We don’t know the scale. (It’s a pity that they publishers so little to enable technical solutions to this). By default I would say that a paper in Molecules is no more or less likely to be questionable than one in a closed access journal from Elsevier or ACS.
The main problem is that the Open Access community has failed to get its act together. And that the closed access community prevents anyone getting an act together.

Posted in Uncategorized | 31 Comments

Machines are better referees than humans but we'll be sued if we use them

Andy Howlett and Mark Williamson in our group have been developing fantastic software.
It can read the whole scientific literature and analyse it in minute detail. One of the things we are starting with is chemistry. ChemVisitor (part of AMI2) can read chemical structure diagrams and chemical names and work out what they mean.
It takes less than a second. That’s pretty impressive, and we’ll be reporting this at the ACS meeting next month. Here’s the first picture we chose.
Our software can read the whole chemical literature every day and work out all the compounds. And I can do it on my laptop.
badcompound
Hey – hang on – you’re violating copyright! And copyright is more important than science, isn’t it? Well, actually I am not violating it here, because this is from a CC-BY paper (I omit the attribution for a reason you’ll see). But yes, if it was from a Tetrahedron (Elsevier) article or J. American Chemical Society I would have to get permission. I’d probably have to pay. I wouldn’t be allowed to do X, Y or Z… It would take days without any likelihood of success.
And all I am doing is science. Note that chemical structure diagrams are NOT creative works. They are data. They are the only effective way of communicating what the compound is. But Elsevier and ACS and Nature and Science and … will all challenge me with lawyers if I take diagrams from non-CC-BY articles (e.g from Nature).
Now Andy has just mailed to say that this diagram is wrong. One of the compounds is incorrectly drawn. He’s contacted the author who has agreed. The error matters. These are compounds that many of you may eat. If the compound has the wrong name or formula then the science is badly flawed. And that can mean people die.
So try it for yourself. Which compound is wrong? (*I* don’t know yet) How would you find out? Maybe you would go to Chemical Abstracts (ACS). Last time I looked it cost 6USD to look up a compound. That’s 50 dollars, just to check whether the literature is right. And you would be forbidden from publishing what you found there (ACS sent the lawyers to Wikipedia for publishing CAS registry numbers). What about Elsevier’s Reaxys? Almost certainly as bad.
But isn’t there an Open collection of molecules? Pubchem in the NIH? Yes, and ACS lobbied on Capitol Hill to have it shut down as it was “socialised science instead of the private sector”. They nearly won. (Henry Rzepa and I ran a campaign to highlight the issue). So yes, we can use Pubchem and we have and that’s how Andy’s software discovered the mistake.
This was the first diagram we analysed. Does that mean that every paper in the literature contains mistakes?
Almost certainly yes.
But they have been peer-reviewed.
Yes – and we wrote software (OSCAR) 10 years ago that could do the machine reviewing. And it showed mistakes in virtually every paper.
So we plan to do this for every new paper. It’s technically possible. But if we do it what will happen?
If I sign the Elsevier content-mining click-through (I won’t) then I agree not to disadvantage Elsevier’s products. And pointing out publicly that they are full of errors might just do that. And if I don’t?…
Elsevier will cut off the University of Cambridge and the University will then contact me and tell me I have broken the sacred conditions that they have signed. Because no University ever challenges conditions that publishers set. The only thing that matters is price. So all universities have agreed with the publishers that readers cannot carry out text and data mining. They didn’t ask me – they just signed my rights away. If I continue I’ll probably face disciplinary action.
And the scientific literature will continue to be stuffed full of errors. And people will continue to die because of them.
Does anyone care? I don’t think so as no-one (ZERO) from a University has commented on my analysis of Elsevier’s restrictive TDM licence. They’ll just go ahead and sign it. Because it’s the easiest thing to do.

Posted in Uncategorized | 34 Comments

The value of the hacker community: reacting to natural disasters

I used to live on the edge of the Somerset levels and as boy cycle throughout them…
I am including in full a post from an OKFN list [after these paragraphs] , inviting people to hack today (Sunday) in Shoreditch London and virtually to help mitigate the effects of the worst UK floods in living memory. Read it. The message is simple:
  • YOU can make a difference.

People often think that they can’t hack – that you have to speak Perl and Unix and node.js and…
That’s wrong. Hacking is about communities making a difference. We all understand communities, so we can all be hackers. The mayor of Palo Alto ran a city-wide “hack the city”

  • EVERYONE is welcome at a hack day.

You don’t even have to have a computer. Just an ability to communicate.
I can’t be there (I am in AU).  AU also gets floods (and bush fires). Last time I was here the Melbourne Age newspaper held a hack day in its offices – one of the topics we hacked was bush fires. There were lots of non-geeks there.
What you will find in Shoreditch today is a random selection of people – could be 5, could be 500. The thing in common is that they want to help. They know that no single person has the answer. That they don’t, at present, even know where to start.
That’s where you could well be able to help. Perhaps you are in local government or the voluntary sector? Or, maybe you’ve actually been in a flood or have detailed experience from someone who has. That’s a great starting point to find out what people actually want rather that what we think they want.  That’s why I was so impressed with the NHS Hackdays – people identified useful tasks that were achievable and then achieved them. Maybe you’re a teacher, or maybe you are still at school. Yes, school children can change the world.
And, of course, we are unlikely to solve everything this Sunday… Much of the success will be taking good starting points and building the communities and protocols that will make them sustainable. When the 2010 earthquake hit Haiti the Openstreetmap community – hundreds of thousands – leapt into action to use satellite photos to recreate pre-earthquake roads and buildings. Read http://hot.openstreetmap.org/projects/haiti-2.
Maybe you’re a keen photographer and went on holiday in Somerset. Perhaps your photos could be useful – I don’t know. Or maybe you know about low-cost boats. Or fly drones as a hobby. Who knows? A feature of hacks is that we pool ideas at the start and see which catch people’s imagination and which are feasible. It doesn’t matter if *your* ideas doesn’t work out, simply that good ideas get developed. Glory is communal not personal.
Perhaps you can find information on key resources that might be available but unused.
I may be able to log in from AU. What can I do?

  • give moral support.
  • spread the word
  • cross fertilise
If only one person reads this post and does something that’s massively worthwhile. Now the details
====================================
From: Joshua March <josh@conversocial.com>
Subject: Your country needs you: #FloodHack this Sunday!
Date: 14 February 2014 20:54:24 GMT
Hi guys,
The government called a meeting today with a number of major UK tech companies to discuss what the tech and developer community could do to help with the flood crisis engulfing the UK.
As part of this, the Environment Agency agreed to open up real-time data on flood levels/status, mapped across the UK, so that developers can utilize the data for free (at least for the next three months).
We’re organizing a hackathon THIS SUNDAY in Shoreditch, London, to build apps on top of the data to try and help people keep up to date with the issues in their area (or areas they’re traveling to), and get the data they need on how they can get help, how they can volunteer etc.
There are more details below, and on the hackpad page: https://hackpad.com/UK-Flood-Help-February-2014-QFpKPE5Wy6s
This is obviously super short notice, but an amazing opp to build something that could actually help thousands of people. Google have agreed to host it, and will be sending developers, as will Facebook, Microsoft and many other start-ups in the area (including my own).
Please spread the word, and come down if you can make it!
Calling all developers!
We have been hit by the worst flooding and weather the UK has seen in our lifetimes. Getting the right information to people about the problems affecting particular areas, and the right places to turn to help (or for information on how THEY can help volunteer) is crucial. The government has near real-time data on flooding levels and alerts, mapped out across the entire country, which they want to put to the best possible use. Following a meeting called today at Number 10 with leading technology companies, the Environment Agency, the Government Digital Service, the Open Data Institute and the Cabinet Office are working to open up this data to the public for the next three months, allowing developers to build innovative applications that can help those affected by the flooding.
This Sunday at 10am, join developers from Google, Facebook, Twitter, Conversocial, Datasift, Mother, Taskhub and more for a hackathon, hosted by Tech CityUK at Google Campus in Shoreditch, where the Open Data Institute will share the flood level data with developers and be on hand to help throughout the day. The Cabinet Office will be choosing the most useful applications demoed on the day to be promoted to flood victims across the country.
Please register for the hackathon here.
Your country needs you!

 

 

Posted in Uncategorized | Leave a comment

Have scientists finally got angry enough to rebel against publishers?

Richard Smith (http://blahah.net/about.html ) has posted a very brave piece about how to create a revolution to change the process of scholarly publishing. http://blahah.net/2014/02/11/knowledge-sets-us-free/.
Before I start, I know Richard and when unembargoed will enthusiastically blog his ideas about a marketplace for software and scientists. I am also personally delighted that in the short time the OKF (I should say Keren Limor, of course) have been running Open Science discussions at the Panton Arms in Cambridge we have had massively important meetings. I missed Richard’s – but think it will be scene-changing – and I missed this one.
I’ve copied Richard’s post in full and comment here…
First. YES!
Finally the moral unacceptability of TA-STM publishing has hit the modern world. The good news is that the technology is now so powerful that if we want to change it we can. It won’t be pretty and it won’t be predictable but it’s possible.
I have blogged before on the role of civil disobedience. Breaking to formal law to promote a higher moral good. It’s been a critical force in many countries over millennia. It’s always risky and people may suffer. The important things are:
* is there a compelling moral case?
* is there a likelihood of making change happen.
The second is optional. A moral case is good enough, but it can be very lonely. But if you can change the hearts and minds of enough people, then change can be rapid.
So Second, I am with you. I want to disrupt the system. I’m currently doing it in a parallel and complementary way. It might be judged illegal and I am prepared to take that risk.It’s undoubtedly moral. Until you gave the lead I didn’t have any authority – it’s not for my generation to tell your what to do, but to support it when it does it.
The key things are critical mass, simple coherent aims, and irresistible technology.
 


Written by: Richard SmithLast updated: 2014-02-11 17:15:00 -0800

Last night at Open Research CambridgeJelena Aleksic gave a great talk about Open Access. In her closing comments, she floated the idea of an iTunes for scientific papers. Imagine being able to get any scientific paper for 79p. That’s a reasonable price to cover costs of creating, archiving and distributing knowledge (given the research is already funded). Most people can afford it.
Current prices – $32 for one-off access to a Nature paper – are disgusting. Scientists created that knowledge, probably with public funding, then a team of other scientists peer reviewed it without getting paid, and Nature wants to make $32 from imprisoning it on their website? Fuck you, Nature.

Music shows us how to set knowledge free

If we can possibly bring about a situation where knowledge comes at cost price (~79p), or better – free, at the point of consumption, it’s our moral imperative to do so. To do that we have to destroy traditional publishing. No small task. Be we can take lessons from how the music industry was transformed.
Ubiquitous music piracy broke the strangehold traditional record companies had over listening to music. In the late 1990s CDs were £10-20. If you wanted to hear a particular song you had to buy a load of other songs at the same time and wait for them to be delivered to your house on a plastic disk. Then software made it trivially easy to pirate music, and in the last ten years record companies have been forced to change their business models to match the needs of their consumers. Now we can buy any song for pennies, and if someone can’t afford it, it’s easy to get it free.
When it becomes trivial to pirate scientific papers whilst being very difficult to trace the source of piracy, and at the same time it becomes very easy to search and acquire pirated papers, the tyranny of publishers will be over.

A vision of the future

Let’s imagine what that utopian world would look like by examining a few scenarios
1. A student/researcher has a library of hundreds or thousands of PDFs and associated metadata in their reference manager. They wish that knowledge was free.
They fire up the Liberator software and hit a button. Their reference manager database is anonymised and liberated into an online, distributed repository.
2. A student/researcher is browsing a journal’s website.
They are running the Liberator browser plugin that grabs every paper linked from every page they visit and anonymously sends it to the open respository network.
3. Anyone wants to read a paper.
They go to one of dozens of websites that let you search the distributed network of papers that have been liberated. They can download the paper and the data, and link out to open peer reviews from a variety of sites to enable them to judge the quality of the research for themselves.
4. A student/researcher wants to liberate their entire subject.
The Liberator connects to the distributed network and gets the list of papers already liberated – the “free list”. When the user connects to the internet at their library, the Liberator compares the free list to what’s available for access via the library. It starts anonymously crawling the publishers’ sites and liberating papers that aren’t free yet.
5. A citizen wants to contribute to freeing all knowledge
They visit a University library that has free public internet access, and deposit a tiny box in a discreet place. The box contains a raspberry pi with a USB wi-fi plug. The raspberry pi is running the Liberator, and starts crawling and setting papers free.
Perhaps they don’t live near a University library. They visit anywhere with public internet and deposit their raspberry pi. It connects to a decentralised database of hacked or donated student library logins and begins crawling, liberating.
Perhaps there’s no public wi-fi near them. They run the Liberator in TOR mode, and it anonymously crawls from the safety of their home using the login database to gain access.
6. Someone (most likely, some consortium of publishers) attacks the network. They use court orders to take search sites offline and have servers shut down.
Within seconds the network has recovered – mirror sites are pre-arranged to launch when others go down. All the data is held distributed around the world and cannot be destroyed without destroying the world’s computers.

The future is now

Sounds rosy, huh? Nobody gets hurt, and the whole of human knowledge becomes free. Publishers can’t stop it: their customers are the Universities. They can’t cut them off without cutting off their own income stream, and the Universities have already paid for all these papers. Millions of people running the Liberator are righteous leeches. They bleed the publishers to death. This allows us to rebuild knowledge archival and distribution for the modern era using open processes.
The cool thing is, this is all technically possible to achieve using tools that already exist, or that could be rapidly developed. I propose the following set of software to make this happen:

  1. The Liberator: A web crawler that scrapes publishers websites and submits papers along with all their supplementary files and metadata to the Liberator network. The scraping uses Zotero’s community-maintained translators which already cover all the major publishers and many minor ones. It doesn’t duplicate effort – it checks whether a paper is already free before liberating it. It can securely update over-the-air to add workarounds when publishers start trying to block the crawler. It can also find databases from all commonly used reference managers and anonymise and liberate their contents.
  2. A torrent tracker with features that allow effective search and display of scientific papers. It produces RSS feeds that contain random subsets of the new papers in the network, so that new papers are evenly spread out around all seeders without anyone having to host all papers. These are minor modifications to existing open source torrent trackers, like Gazelle.
  3. Browser plugins that run Liberator whenever an academic publisher’s website is visited. This is a trivial extension to the Zotero connector plugins.

If you want to help make this happen, go here and start talking (anonymously, if you like).
 

Posted in Uncategorized | 2 Comments

Reply to Richard van Noorden

[Note I have switched laptops and this has caused delay – also I cannot yet do formatting].
Earlier this week I strongly criticised Nature News and Richard van Noorden (/pmr/2014/02/10/natures-recent-news-article-on-text-and-data-mining-was-an-unacceptable-marketing-exercise-i-ask-them-to-renounce-licensing/) for a post about Elsevier’s click-through licences. My concern was that the article was – [I agree not intentionally, and withdrew the “marketing”] – supportive of nature’s business interests. Richard has replied I’ll set the scene first and then reply to specific points.
I look to Nature News as a reliable source of scientific news and comment (unlike , say, the UK’s D**ly M**l). I suspect that many readers, including me, glance at the headlines and the first paragraph and then move on. So I read:
Elsevier opens its papers to text-mining

Researchers welcome easier access for harvesting content, but some spurn tight controls.
and the first paragraph…
Academics: prepare your computers for text-mining. Publishing giant Elsevier says that it has now made it easy for scientists to extract facts and data computationally from its more than 11 million online research papers. Other publishers are likely to follow suit this year, lowering barriers to the computer-based research technique. But some scientists object that even as publishers roll out improved technical infrastructure and allow greater access, they are exerting tight legal controls over the way text-mining is done.
I suspect that most readers would see this as a statement of a fait accompli. It’s going to happen the way the publishers say. Yes, a few people are carping; but the world is moving ahead
Nature has a vested interest in seeing this happen. For whatever reasons it supports the STM publishers in their intention to offer licences for content mining. Note that this is not the result of a negotiation – it is a unilateral move by the publishers. And it’s totally opposed by all major academic bodies and library organisations as I detailed.
This is not the only case where a publisher’s interests have coincided with a favourable story.
* Science Magazine did a “study” “showing” that Open Access peer-review was flawed.

Who’s Afraid of Peer Review?

A spoof paper concocted by Science reveals little or no scrutiny at many open-access journals. [PMR: Note Science appears to have a significant business interest in keeping the Toll-Access status quo]

* Taylor and Francis “surveyed” 71 K readers and reported that they preferred CC-NC licences over CC-BY. [PMR Note: T+F have an apparent business advantage in restricting APC licences to NC]
* and here NPG have an interest in licensing TDM rather than accepting copyright extensions.
My concerns with the piece were that it gave a completely unbalanced view. Richard notes, and I agree, that elsewhere he has reviewed the case for copyright reform, but it was not in the current piece. A casual reader would not go searching for history, but assume that the licence issue was relatively uncontroversial.
Nature News wields great power. It is therefore critical that where it has vested interests they are made clear.
The same story could have been reported very differently (e.g. by Alok Jha or George Monbiot of the Guardian). An organisation critical of TA-STM publishers might have written:
“Elsevier ignore coming copyright reform and create de facto approach to licensing”
“In an attempt to forestall coming legislation which would make content mixable by all scientists, Elsevier has rushed through a licence scheme to persuade scientists that they can content-mine their journals. Other publishers seem likely to follow. But our experts showed that the licence was designed to protect the publisher’s business interests rather than assist the researcher – who might unwittingly end up in court.”
Same story – different emphasis. It was critical that NN stayed objective and I don’t think it did.
Detailed comments:
Dear Peter,
I believe my article was fair, giving representation to pro and anti- sides in this debate.  Agreed there were two sides, but not that one was highly favourable to Nature.
Let’s dig into the detail: you suggest that the article was ‘biased reporting’ which ‘purports to be news’ and was ‘effectively an attempt … to promote publisher licenses as a benefit to science’. My article does not intend to make a case for or against publisher licenses. It is, quite simply, reporting: explaining what has happened, and how scientists reacted to Elsevier’s new policy (which was, of course, news).
I should rephrase. I do not question your motivation. FWIW I was also listened for an hour to John Bohannon and believed he was sincere. But the overall impression is a news story which is supportive of Science’s (and here NPG’s) interested
Far from a bias for publishers’ licenses, the article clearly states the objections that you raise against the license approach. The introduction says that ‘some scientists object that even as publishers roll out improved technical infrastructure and allow greater access, they are exerting tight legal controls over the way text-mining is done’. The final three paragraphs explain precisely the complaints that some researchers have with the way publishers are setting license-controls on text-mining activity, leaving the reader with Ross Mounce’s criticisms.
“Some scientists” is far too weak. It should be replaced with “major national scientific societies, major funders, and international library organisation ware all absolutely opposed to licences”
On the other hand, for all you might disagree with them, it is a fact that other scientists I spoke to – including Max Hauessler, who has been very critical of Elsevier in the past – were pleased about the API and the click-through license. They told me that this would open up TDM opportunities, albeit under restrictive conditions (conditions that the article explains). I had, as you know, contacted you for your reaction too. You pointed me to your first blog (written before your more detailed analysis, which wasn’t available at the time), and I judged that Ross Mounce had already provided a voice for that view in the article.
I have explained that some scientists would welcome this – that does not mean it’s acceptable. Have any of the scientists been asked “are you happy to answer in court if you impinge on Elsevier’s business interests?” “Is your library happy with the licence or may you be disciplined?”. This is no more reliable than T+F’s 71K readers.
It is particularly bewildering that you accuse Nature of “failing to report any of the Licenses4Europe discussion”, and ask for “a balanced account of the Licenses4Europe story”.
For as far as I am aware, Nature is the *only* mainstream media venue to have reported the Licenses4Europe issues. In March 2013, I covered the clash between scientists and publishers over licenses, and in June 2013, further reported on the divisions rife in the European Commission TDM discussions. What’s more, two years ago I wrote the first media coverage of Max Haeussler’s struggles to get permission from Elsevier to text-mine for biological sequences.
I didn’t consider that the Licenses4Europe discussion needed to be explained again in this article: for I had already explained the argument that ‘the right to read is the right to mine’, and noted that the European Commission was examining the issue. Of course, all the relevant previous coverage is linked to at the end of the story.
The problem is that the casual reader will not know the history and will regard the links as superfluous detail.
Where does this discussion of bias and reporting balance leave us? Your critique helps me think carefully about how I’m reporting my stories for our readers. And your campaigning is bringing the issue to wider attention; I’ll be as interested as you are to see how NPG responds to your call for the company to ‘publicly renounce the use of licenses to control TDM’. Your examination of Elsevier’s detailed legal terms is also very useful. So, broadly, I welcome your letter.
Thank you. And I welcome your critique here. If this gets a different response from NPG over TDM licences (and nature is well placed to do so). It will have been worth while.
Except this: you have conflated your antipathy to NPG’s (and other subscription publishers’) TDM policies, with the incorrect accusations that the reporting in Nature was an attempt to promote publisher licenses, and was somehow ‘marketing … under the guise of news’. I’m pleased that you have already retracted your implication that I was involved in a marketing exercise. I hope that in future you’ll keep separate your critiques of my reporting, from your critiques of NPG policies.
I have retracted this assertion that it was deliberate. There is however a danger that any large institution becomes corporatist and institutionalist and I think publishers have to be particularly careful.
Richard.
I have high regard for almost everyone I have met in NPG – Philip Campbell, Timo Hannay, the New Technology Group and now Digital Science, and the Blogs and ScienceOnline teams. I would not say the same about other TA-STM publishers. But I think Nature – as an organisation – has to be very aware of its roots in the community.
 

Posted in Uncategorized | Leave a comment