Monthly Archives: February 2012

@ccess for all: Update and Oxford meeting

We now have the Twitter tag @ccess! This is fantastic. Thanks to Tyler Neylon for making this happen.

The progress on and is fantastic. On the latter we are getting daily stories from the #scholarlypoor – people who want to read the scholarly literature and cannot. Read them and see how powerful their stories are – people who leave their job feel a great sense of loss and deprivation, and spending thousands of dollars is not an option.

I'm going to Rhodes House Oxford for : Scientific Evolution, Open Science and the future of publishing This is a great event and I've been asked to ask a question. I've sent this in – not sure whether the panellists have seen it so I shan't put it here but it's about the #scholarlypoor. The tragedy is that the world is deprived of scholarship and we have to put that right. The first step is to recognise it and cement it in our articles of policy – my approach is .

Then we have to work out how to make it happen. This is where the #scholarlypoor have the power. We – I count myself as part of the #scholarlypoor as the publishers have forbidden me to do the research I want – should mobilise and make our voice heard. If the world trembles when 7000 academics (including me) Boycott Elsevier then how much more the power of the world, feeling the deprivation.

And, yes, unlike the woolliness of most academics this is a hardball fight. The #scholarlypoor have no hIndexes to worry about and their demand is simple (I have been on enough demonstrations to know this by heart!)

  • What do we want?
  • Access.
  • When do we want it?
  • Now

If you can remember this simple chant, join us.

What’s the Real Value of a Scholarly Publication? Part I

I've been invited to a very timely meeting in Oxford next week to discuss the future of Scholarship. "Open Science and the Future of Publishing" . The question I want to ask is (roughly):

"We the public pay 10 billion USD annually in journal subscription fees [*] and 200 billion USD for research; what value do WE get? And what value do WE lose by closed access?"

[*] throughout this post I use guestimates which are probably off by half an order of magnitude either way (i.e. factor of 3). This is partly because much of the information is secret (and some so secret that you will be sued if you divulge it) and partly because academia and we the public don't yet care enough to find out. I am also removing CC-BY publications from the argument to avoid having to say "except for CC-BY" all the time. It's about 5% of the market, if that. So I'd like your help.

I am also working this up for a (unfortunately virtual) presentation I am giving in Poland next month. I am taking my text from Wikipedia: (This is 6 years old and not disputed so I take it as more-or-less correct. If anyone can fault this, we shall all benefit)

Let me tackle COST and PRICE first.

The COST to the public purse of scholarly publishing is of the order of 10 billion USD. There are also contributions from industrial subscriptions, and from student fees, and 1% from pay-per-view, but the bulk is from taxpayers. In return for this the public get virtually no value or rights. If you the public, you the government, you the NHS want to read a paper you either have to pay again or walk to St Pancras and read it in the British library premises (you cannot get this online because of publisher restrictions – mad and sad but true. The BL even charges me to read my own CC-BY papers if I'm not at St P.).

This is set by the PRICE of electronic journals. This bears no relation to the COST of production. The cost of production can be very low. It's USD 7 for ArXiV (not peer-reviewed) and about 100 USD for Acta Cryst E (a very high-quality peer-reviewed data journal). In an efficient organisation it's inconceivable that the COST of production of a journal article is more than 200 USD. Any higher PRICE comes from the following:

  • The ADDED_VALUE that the publishers assert they add
  • Inefficiencies (often gross) in the publishing system. (For example almost all author manuscripts are retyped from scratch).
  • Profits

Publishers like Nature estimate costs-per-paper at 20,000 USD. That is not related to the cost of production but something else. Perhaps the high rejection rate? The basis of these "costs" is kept highly secret.

The PRICE of pay-per-view articles (about 35 USD for one day's rent) is the only part with real elasticity . The only evidence I have is from my FOI requests to Oxford/Cambridge University presses (they are public organizations, parts of the Universities, so have to reply – if you want publishing facts consider University presses).

CUP:  [ ]

In 2010, 13,646 articles were purchased as PPV. In 2010, the total number of articles for potential purchase via CJO was 680,000.  Revenues from PPV approximated to 1.3% of Journal subscription revenues in 2010. 

OUP [ ]

 In 2010, 37,157 PPV articles were purchased [OUP do not know how many purchasable articles they publish]  PPV represents around 1.5% of total journal subscription income. 

I take heart from the consistency of the figures (TWO coincident points!) and surmise that other publishers get 1.5% of their income from Pay-per-view. It's possible, but unlikely, that the large profits of other publishers comes from Pay-per-view but I and you will doubt that. It's clear that the price is far too high and it amazes me that publishers still use these levels which were – I assume – set by the cost of paper in interlibrary loans. I'm no economist, but it's actually stupid to run these prices . If they cut their prices to a fifth – 7USD - and gained 5 times more custom they'd still make the same income, incur no more costs (really!) and gain a great deal of goodwill. And even if they gained no more readers they'd only have lost 1% of their income. But they probably know something about a small subset of customers who have to use this service and they don't care about everyone else. Which is also inelastic.

If any closed access publisher can give figures here we'd be delighted.

It's also a serious condemnation of the effort to promote scholarship. Only 2% or all articles are ever purchased each year. I imagine the 680,000 includes historical articles, and if we take this as 50 years, then each modern article is purchased about once each year. Which shows that it's value to the public is almost zero.

We now need to establish the cost of public (include charity) funded research. I have asked many times without finding authoritative results. So here's a beer-mat calculation, and allow +- half an order of magnitude. I approach it from these directions:

  • Wellcome Trust allow about 2% of a grant to cover publishing. So if scholarly publishing is USD 10 billion, then public research is 500 billion USD
  • The income for Cambridge, Stanford, etc is ca 500 million. Assume 1000 research universities in the world (can anyone do better?) and a power law and we get ca USD 200 billion
  • The NIH is funded at USD 35 billion. It's probably the largest, but add in national funders and you are well over USD 100 billion

Let's use a figure of USD 200 billion (though I am sure it's higher).

I'm now using VALUE in the sense (from Wikipedia):

Value in the most basic sense can be referred to as "Real Value" or "Actual Value." This is the measure of worth that is based purely on the utility derived from the consumption of a product or service. Utility derived value allows products or services to be measured on outcome instead of demand or supply theories that have the inherent ability to be manipulated. Illustration: The real value of a book sold to a student who pays $50.00 at the cash register for the text and who earns no additional income from reading the book is essentially zero. However; the real value of the same text purchased in a thrift shop at a price of $0.25 and provides the reader with an insight that allows him or her to earn $100,000.00 in additional income is $100,000.00 or the extended lifetime value earned by the consumer. This is value calculated by actual measurements of ROI instead of production input and or demand vs. supply. No single unit has a fixed value. Value is intrinsically related to the worth derived by the consumer. [Burke(2005)].

And asking "What VALUE do the public get for their 200 billion dollars?"


"what extra VALUE would they get if the research was published openly?"

And again, if you have insights let me know.

@ccess: #scholarlypoor: Craig Dylke, teacher and artist

There's an arrogant assumption among many academics that scholarly publishing is produced by academics (maybe 1% of the population) to be read only by other academics (1% of the population) and that no-one else matters. After all why would anyone other than a dinosaur scholar be competent to read a paper on dinosaurs. And surely dinosaur papers have no financial benefit to the world.

WRONG – on both counts.

Mike Taylor has done an awesome – truly awesome – job in pulling together our ideas and hope for the @ccess movement – the imperative to make scholarship available for the #scholarlyporr. Those are the people who don't have access to a University library. And access doesn't mean driving to a building, filling pout forms and getting a paper copy. It means online access. Immediate and expansive. Because that's the only form of access that's now reasonable for scholarly articles [I deliberately omit books].

Mike's been interviewing the scholarly poor. I've done an interview [ ]– just because I'm at a rich university doesn't mean I can use the electronic library as I want to. My research is stalled because the publishers forbid it. Everyone is scholarly poor when it comes to text-, data- and image-mining. But you know all that.

What's tremendous is the stories that are emerging. And I get the impression from Mike that he's got a number yet to be published. So here's someone who passionately wants to read the dinosaur literature. You'll need to read it yourself, best beloved, because I can't show his dinosaur pictures. Here is he teaching, and I'll give some exceprts below:


CD: I try to help connect the science of palaeontology to a larger audience. Palaeo-art lets me do this in a way that combines my childhood obsession with palaeontology and my love of digital art. I've become so interested in the the philosophy, and methodology of palaeo-art that, together with Peter Bond, I co-founded the community blog ART Evolved where we discuss and encourage palaeo-art of all forms.

But why does Craig need the literature?

When you scientifically reconstruct an animal, every detail of its physical appearance is important. For most prehistoric life, the only place to get details about fossilized remains and informed speculation on what that extinct life might have looked is in the scientific literature. From my perspective as an artist rather than a researcher, the most useful part of papers is the diagrams and photographs of the fossils

Craig cares about getting it right. As simple and as important as that.

… there are times when I would love to have it to check "facts" in popular children's books. The number of factual mistakes in these books is sometimes quite alarming. Being on top of the most recent publications can also lead to good discussion topics for my students: news outlets only report a fraction of new science discoveries.

And the problems?

The fees for subscriptions, or for single papers are simply outrageous. Many of my digital art software packages cost less!

Limited access to scientific literature has also created an interesting problem in palaeo-art. Without access to source material, many artists resort to referencing other artists. Then you get artistic "memes" in which organisms are consistently shown with characteristics that we have no actual evidence for. (Since the art is the closest thing we have to photographs, they gain an implied credibility when repeated enough times). This runs completely counter to my science education goal.

What changes would you like to see?

Frankly that answer is simple. Either researchers only publish in free access journals or the publishers get with the times and open access to their content.

I'd also like to see more journals offer unlimited illustrations for authors. On any given subject PLoS papers are almost always the superior source material for me as an artist, as the authors tend to fill them liberally with photos and diagrams of their specimens. Too often I've been disappointed to track down a critical paper on topic from a mainstream journal only to find there are no diagrams or photos, leaving me at square one on my restoration.

As I have already noted, even a fraction of the scholarly literature is valuable. We're fighting to get it all, but until that time we are trying to get as much as possible together for Craig.

And there's no money in dinosaurs, is there? Jurassic Park grossed 900M USD. By depriving the creative #scholarlypoor of the literature we are denying them their full potential.

@ccess is launched!

Today we have launched @ccess – a new site, and more importantly a new community – to make scholarly information REALLY LIBRE available. I'll stress to start with that this means all disciplines and all types of information and means of communication. Because I'm a scientist I'm concentrating on STEM but it covers everything. By LIBRE we ean free to use, re-use, and redistribute for any purpose. It's covered by the Open Knowledge Definitions and the actual text of the Budapest Declaration on Open Access 10 years ago.

I've blogged about this before. Any information is better visible than not, but simply "being on the web" isn't good enough for many (I'd say most) modern uses. There are 101 reasons why information must be fully LIBRE and why GRATIS is not good enough. There are 10 million paragraphs on chemical reactions I want to read each year and I must use machines to do this. GRATIS does not work for machines. They can't work out rights or protect me from being sued. And that's the reality. If I use a scientific paper beyond what I am allowed to do I'll be sued and the University of Cambridge will be cut off.

The only way to ensure this is to make sure all the information we want is LIBRE. Free to use, re-use, redistribute for any purpose, commercial as well.

Note that the term "Open Access" is operationally meaningless. The term "fully Open Access" is even worse because it is seriously misused. Some publishers offer "fully open access" and give the reader no rights at all.

The problem is that only about 3-5 percent of current scholarly information is LIBRE. It's actually very difficult to get a figure, because information isn't generally labelled with its rights. Print a typical scholarly pub and the print will often tell you very little about the rights. It may not even give the actual copyright owner – so you don't know whether you can copy it and who will sue you. Some "open access" publishers DO label the material – here's BMC:

All articles are immediately and permanently available online. Unrestricted use, distribution and reproduction in any medium is permitted, provided the article is properly cited. See our open access charter.

But almost all hybrid papers – where you pay substantial money (perhaps 2000 USD) to make the paper "Open Access" - are neither labelled nor LIBRE. Ross Mounce has shown that only 5% of publishers offer LIBRE "open access" – the rest still impose restrictions or severe restrictions on use. And in my simple study of avian malaria in Pubchem only about 3 papers out of 70 were LIBRE at first glance.

So let's say 5% of the current published scholarly output can be reused without thinking and without worrying. Because that's the only guide. If you have to think, then it's effectively not re-usable on a large scale. Machines can't understand lawyers. And they can't interpret information this isn't given.

What can you do with 5%?

More than you might think at first glance. Much more.

Academics often have a narrow mindset that the only reason for publishing a paper is so some other academic can read your paper. That if we don't have access to the precise paper we cannot do anything. Sometimes that's true. But sometimes we just need representative material in that area. Let's say I want to know the conditions for making an ester (a type of chemical) and there are 500,000 esterifications published a year. 5% of that is 25,000 different reports. My machines will certainly find all the mainstream types of reaction. If I want to know how to grow a common cell type, or prepare a specimen, or find the methods using for recognising motifs in genes or … I'll certainly find enough examples. If I want to find images of mosquitoes, or a graph of the average rainfall in W Africa the LIBRE literature is almost certainly good enough. If I want to analyse the type of language and terms used in malaria articles the LIBRE literature is more than enough. If I want to find which countries the work is done in the LIBRE literature is all I need.

So we need to label and liberate LIBRE scholarship. And then persuade people to label their articles properly. And hopefully to persuade them of the immense value of LIBRE over GRATIS.

So the recent heroes of our effort have been

  • Tom Olijhoek and Bart Knols. Here's Tom's report in Malaria World . Malaria is a really good place to start as the concept is well contained and we can find everything through UK/PubMedCentral. They have also helped to create the site . That's a really good place to start
  • Mike Taylor, sauropodologist ( ). Mike has campaigned tirelessly and burnt midnight oil to create the site which runs in parallel with the @ccess site. He's collecting interviews, including one from me, on why we need LIBRE @ccess.
  • Mark MacGillivray who continues to add fantastic design and power to . Mark's Bibserver uses faceted search in an incredibly powerful manner. The technical details are completely hidden from the user. The technology can interact with the Semantic Web / Linked Open data and is a great community builder

Anyone can be a member of this effort – you just need passion and energy and a need to provide LIBRE resources. And if you have a story about how and why you need LIBRE material and can't get it , then highlight it on the mailing list or help populate the questions on the wiki.


Boycott Elsevier: Does your institution invest in them?

I am supporting the Boycott against Elsevier ( ) not only for the reasons given there (exorbitant prices, bundles of unwanted journals, support for SOPA/PIPA/RWA) but also because they exercise monopoly control. This monopoly is supported through restrictive contracts and cripples innovation in scholarship such as text-mining and data-mining, re-use of factual scientific information and many other necessary actions.

Elsevier's market is based on reputation and is fragile. The CostOfKnowledge boycott was sufficiently prominent that it caused the share price to dip. Investors are clearly watching the current concern about our protests and we should be able to transmit our concerns to them. In the UK we are able to ask public bodies questions through Freedom Of Information and I am now suggesting we do this on a wide basis.

It would be very disquieting to find that any University or any public body responsible for libraries actually invested in Elsevier. That would imply a conflict between trying to reduce journal prices and benefitting from having higher ones. I am therefore asking my current University to confirm that they do not invest in Elsevier.

This is easy to do. Visit and type a brief letter (such as the one below). The University is required to respond within 20 working days. Anyone can do this (you don't have to be a UK citizen AFAIK and you don't have to have any connection with the institution

To: University of Cambridge
Subject: Freedom of Information request - Investments in Elsevier

Dear University of Cambridge,

I would like to know if the University or any of its subsidiary companies have any investment in Elsevier (Reed Elsevier PLC/N.V.), and if so how much.

Yours faithfully,

Peter Murray-Rust

I would urge readers of this blog to copy my action and ask other Universities to confirm that they do not invest in this way.

This type of action will also help to keep the momentum of the boycott.

URGENT: US Citizens MUST Sign RWA petition

I'm amazed and saddened that the community has not massively signed the petition against the RWA. (I haven't because I'm not a US citizen). The petition wanted 25,000 signatures and has only got a much smaller amount. It gives the impression that academia doesn't care. If the petition isn't signed, then publishers will say "what a wonderful job we're doing. Academia loves us – they approve of the RWA – The NIH is against the vibrant market economy, etc.

It's simple – sign the petition.

Subject: HR3699, Research Works Act

Rep. Caroline Maloney has not backed off in her attempt to put forward the interests of Elsevier and other academic publishers.

If you oppose this measure, please sign this petition on the official 'we the people' White House web site. It needs 23,000 signatures before February 22nd and only 1100 so far. Please forward far and wide.

Oppose HR3699, the Research Works Act

HR 3699, the Research Works Act will be detrimental to the free flow of scientific information that was created using Federal funds. It is an attempt to put federally funded scientific information behind pay-walls, and confer the ownership of the information to a private entity. This is an affront to open government and open access to information created using public funds.

This link gets you to the petition:!/petition/oppose-hr3699-research-works-act/vKMhCX9k

Raji Edayathumangalam
Instructor in Neurology, Harvard Medical School
Research Associate, Brigham and Women's Hospital
Visiting Research Scholar, Brandeis University

Do NOT assume that RWA will fail. A failure to fill the petition will set us back. A late rally will have huge impact.

PeterMR and PeterMR's avatar oppose RWA (and so do many publishers)

101 reasons we need @ccess to BOAI-compliant material: Translation

We've started the @ccess resource and community to make more and hopefully all scholarly material fully BOAI- and OKD-compliant. Anyone can use it for any legal purpose and do anything with it without permission or fear of being sued by publishers. There are probably 101 reason why @ccess is valuable – and most of them I haven't even dreamed of. So one thing @ccess will do is collect examples of why @ccess-compliance is essential. (Note that I shall never use the words Open and Free in a meaningful sense because they aren't precise). So here's an example from the list

On Mon, Feb 13, 2012 at 6:40 PM, Douglas Carnall <> wrote:

>> Especially for scientists access to complete articles and data
>> is compulsory, but I guess that for "laymen" illustrative pictures and
>> abstracts would be sufficient.

>I always get nervous when I see this sort of scientist/layman
>distinction, and I think we should work to eradicate such a boundary
>as much as possible.  (I was a layman myself until a few years ago,
>and would have hated to be fed a watered-down version of research
>while an elite priesthood of scientists got the Real Stuff.

I'd like to reinforce this point. As a translator and editor I very
often deal with unfamiliar topics and need to get up to speed quickly
with the language and jargon typical in a field. It is a major
frustration in my work that the most authoritative work is locked up
behind paywalls. Typically I need to briefly access one key term in a
handful of articles to understand how it is used in the field. As the
prevailing rate for technical translation is around $0.12-0.20/word,
accessing 3 or 4 articles at $30 each to check a single term is
completely unfeasible. But that would be the best way to ensure high
quality. I find paywalls vexing precisely because dumbed down
popularizations are useless to me.

PMR: This is a brilliant example of how people don't realise the different uses to which articles can be put.  What percentage of a domain do translators need? For example if we got 10% of all papers is that likely to be enough.

Another similar requirement is my own field of computational linguistics. To train machines to interpret text you need a marked up corpus. For that you absolutely have to have BOAI material - reading free through a paywall is useless. It needs to be redistributable


DC: The point more generally is that neither the author nor the publisher
can possibly conceive of all the potential ways that a scholarly work
might be useful when it is freely available. If the scholarly
literature could be treated as one vast linguistic corpus, I am sure
that interesting developments in scientific communication,
terminology, and translation would follow, for example.



PMR: So let's collect more examples on the list. What have people wanted to do with scholarly publications and not been able to?

Avian Malaria. Can Bibsoup and @ccess help? Do penguins get malaria?

We're taking MALARIA as our lead project in @ccess. If you haven't read about @ccess, read the previous post. Many peple are incredibly frustrated by lack of access to the scholarly literature. I call them the "Scholarly Poor". If you want to read the literature it often costs 35 USD per paper. PER PAPER for ONE DAY. If you work in a University you usually get this "for free". Of course it's not free – it comes out of research grants, student fees (yes, student fees go to support the library), government grants (if applicable), charitable donations. It feels free to the researcher but it costs a lot.

And if you're not in a University it's anything but free. So we thought we'd have a look at what you can get. Although this is literally deadly serious, I'm illustrating this with our #animalgarden Bibsoup team. What's Bibsoup? It's an idea that lets ordinary beings manage their bibliography and grow new functionality ( ). So we have built a Bibsoup for MALARIA. Pubmed showed us how to download their bibliography. This bibliography is OKD-Open, regardless of whether the content referenced by it is or is not Open. ( ).

Jim Pitman developed Bibserver software and over the last year Mark MacGillivray has developed it into a major resource. Mark's ingested all the records. Tom Olijhoek and Bart Knols of MalariaWorld have given us keywords to search with ("malaria", "plasmodium", etc.) – that gives 73560 records see

The animals are now very worried about malaria – It seems to be common in Owls. Do penguins get it? #animalgarden is going to use BibSoup to explore the literature. They don't have any money so what will they be able to read? They can read the titles, and they can usually read the abstracts.

But abstracts are acknowledged to lack critical information. They don't have things like:

  • Maps
  • Methodology
  • Caveats
  • Tables
  • Pictures of animals and parasites
  • Graphs

For that you have to have the full text. And even if you have the full text you can't reproduce it unless it's OPEN. OKD-OPEN (free to read is not good enough). So how many articles have free full text and how many have Open content?

I sat down to watch the football while Owl and Penguin examined the bibliography. They limited the search to birds by typing "AVIAN". Maybe they have missed a few ("false negatives"), but it won't affect the conclusions. And only one paper was a "false positive" (nothing to do with malaria - feeding garlic oil to starlings) [made owl feel sick just to read it]. They've got 70 papers in the period 2000-2010. Here's their OPENness classification:

  • 8 "Free to read" gratis (the mechanism for being free is not given)
  • 5 "author manuscripts" gratis (maybe the version that the author submitted and not the final paper)

That's 15 out of 70. Just over 20% In 2009-2010 only 1/13 papers was readable without paying.

What did they find? The first thing is that Bibsoup made it incredibly easy to browse this literature (of course Pubmed has provided the base functionality). It's not easy to find whether you can read a paper. Often it says "Full text online" but it means "Full text IF YOU PAY". It usually depends on the journal, and that's where Bibsoup makes the contribution. Bibsoup will allow us in @ccess to identify the publisher and therefore – to a first approximation – whether the paper is OKD-OPEN.

"My journals are BOAI-Open" shouts Gulliver, the Open-Access Turtle (Gulliver is the green one). That makes it easy – we can immediately label all BMC journals as OKD-OPEN. Unfortunately there are only 2 BMC papers in these 70 papers. The other 13 papers are free-to-read. That's a lot better than nothing, but you can't use them in books, lectures, magazines, etc. You can't use them for text-mining. (Actually you can't use anything on Pubmed or UKPMC for text-mining even if it's OKD-OPEN. That because the closed-access publishers have required Pubmed to forbid it, even though it's Open. Using Pubmed for anything automated is almost impossible – the publishers have made sure of that ( ):

Restrictions on Systematic Downloading of Articles

Crawlers and other automated processes may NOT be used to systematically retrieve batches of articles from the PMC web site. Bulk downloading of articles from the main PMC web site, in any way, is prohibited because of copyright restrictions.


Articles that are available through the PMC OAI and FTP services are still protected by copyright but are distributed under a Creative Commons or similar license that generally allows more liberal use than a traditional copyrighted work. Please refer to the license statement in each article for specific terms of use. The license terms are not identical for all the articles.

"What does that mean?" said Gulliver. "It means that in the Open Access subset you STILL cannot use automated methods because the licences might forbid you" said Owl.

"But" said Penguin, "that's what @ccess will do. We only have to read each article once, annotate it, and then EVERYONE will know what the licence is. If we each do a bit, then the work becomes easy. We'll see if Mark can create a button we can click on each record."

And I think Mark can J .

The sad news is that it looks like Penguins get malaria:

H J W Sturrock and D M Tompkins (2007)
Avian malaria (Plasmodium spp) in yellow-eyed penguins: investigating the cause of high seroprevalence but low observed infection.
New Zealand veterinary journal
view at pubmed

And the even sadder news is that Penguin cannot read the article.

But at least we can reproduce a picture from Gulliver's journal (

Click on image to enlarge

Figure 2

Mosquito trapping methods used in this study. A – CDC Light trap hung from dead tree in grassland along Nyong River, Ndibi; B – Net trap placed in grassland along Nyong River; C – Collecting mosquitoes resting in grass and on tree branches by sweep net. Mosquitoes were aspirated out from the sweep net and then placed into holding cages for identification and preservation. D – Ehrenberg bird trap hung in branches of dead tree in along edge of Nyong River grassland.


Owl is not sure that SHE would like to be placed in a cage to be bitten by mosquitoes…




What is the use of @ccess? Do owls get malaria? Is Wikipedia believable? Who’s Alice Hibbert-Ware?

Yesterday I blogged about our new project in Opening scholarship: @ccess. Several people retweeted it, and one asked "What's @ccess for?" – a good prompt for some more information. @ccess is to discover OPEN scholarly information, to label it, and promote it. After that we believe that anything is possible. So I'll use an example.

We're lucky to get interesting birds in our garden and I idly wondered whether birds get malaria. They get influenza, of course, and they are a major host and therefore hazard to human health (the human viewpoint). But malaria? Do birds get bitten by mosquitos? I had no idea. So I went to Wikipedia and in 10 seconds discovered . Yes birds get malaria. From a bird point of view it's very serious:

Hawaii has more extinct birds than anywhere else in the world; just since the 1980s, 10 unique birds have disappeared. Virtually every individual of endemic species below 4000 feet in elevation has been eliminated by the disease [malaria].

And I read on:

since 1995, the percent of malaria-infected Great Tits has risen from 3 percent to 15 percent. In 1999, some 4 percent of Blackcaps — a species once unaffected by avian malaria —were infected. For Tawny Owls in the UK, the incidence had risen from two or three percent to 60%.[1]

And I was gobsmacked. Blackcaps used to be summer visitors only – but now they winter in UK (in our garden). And Owls. I have a special relation with owls in Cambridgeshire as my great-aunt, Alice Hibbert-Ware (who lived in Girton – 5 km from Cambridge), was seminal in persuading the country that little owls should be protected. Here's Girton Bird News ( ):

Once introduced, it spread rapidly and as it spread it fell foul of ever greater numbers of gamekeepers. They accused the Little Owl of every crime in their calendar, […] It was against this near hysterical background that Alice Hibbert-Ware, after an extensive publicity campaign in the press and on BBC radio, was appointed in 1935 by the BTO as principal investigator into the Little Owl's diet. Over the next two years, assisted by 75 helpers in 34 counties, she assembled a mass of data, primarily derived from pains-taking dissection of 2460 Little Owl pellets (the indigestible fur and bones 'sicked up' by birds of prey), from just one of which she extracted the remains of 343 earwigs, and from another 2000 crane-fly ('daddy-long-legs') eggs. This forensic detail both demolished the myths of larders and beetle-luring charnel houses, and swept the ground from under the feet of those who stigmatised the Little Owl as a wholesale destroyer of game-bird chicks. Over the years the bird's black reputation has withered away, due in no small measure to the initial efforts of Alice Hibbert-Ware, and it is now a welcome addition to the fauna of these islands. So, remember Alice when next you rest in the shade under 'her' trees!

I remember her through a photograph given to my father by Eric Hosking, the great bird photographer. It's a gorgeous photograph, with the owl at the entrance to the burrow. Here's a detail showing clearly that the owl is eating a cockroach, not a partridge chick.

So maybe Little owls also get malaria? And that's where the problem starts. Wikipedia gives references

^ GaramszegI, László Z (2011). "Climate change increases the risk of malaria in birds". Global Change Biology
17 (5): 1751–1759. doi:10.1111/j.1365-2486.2010.02346.x.

It's from Wiley. So I have to pay. I don't know how much but probably 30-40 USD. And I have to read it by midnight because I only have ONE day. So of course I don't read it.

So I don't know that owls get malaria. And I don't know whether it's restricted to Tawny Owls. I imagine not. So the Girton little owls probably have malaria.

And @ccess? When I read Wikipedia I'd like to know whether the references are worth following. It's a waste of my time to click on links behind Wiley's paywall. I have a legitimate need to follow up this information – it's nothing to do with my day-job in the Univeristy of Cambridge, it's because I am a concerned member of the human race.

Birdwatchers are part of the scholarly poor. @ccess aims to collect OPEN information in subdomains – doesn't have to be science, but that's my speciality. It has to be OPEN. The info can then be used for anything. Here's some ideas:

  • Collections of images
  • Guides for health workers and patients
  • Mapping information onto Open maps
  • Tutorials

And hundreds more ideas

Here's a typical example of a paper on avian malaria

Struct Biol. 2008 Jun;162(3):460-7. Epub 2008 Mar 21.

The avian malaria parasite Plasmodium gallinaceum causes marked structural changes on the surface of its host erythrocyte.

Nagao E, Arie T, Dorward DW, Fairhurst RM, Dvorak JA.

Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA.

It's got some lovely images in of how malaria infects cells, using atomic force, scanning and transmission electron microscopy (an area I used to be involved with). I'd like to put them on this blog. But I can't. The paper is published by Elsevier and costs 31 USD to read. If I take images from that paper Elsevier might sue me. (Not fanciful, Wiley threatened a graduate student for daring to put a scientific image on her blog ). So science is impoverished.

But hey! At the bottom of the paper it says:

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

This is almost gobbledegook to normal humans, but for those of us accustomed to doing battle (sorry, but that's how I feel) with publishers I interpret this to mean:

This is what the authors sent to the journal. The copyright in this does NOT belong to the publisher and they have no rights over it. It's technically the author's pre-publication pre-review manuscript. So-called "Green" Open Access (not a self-evident term to non-specialists).

But that means the authors still hold the copyright? And I would have to ask them for permission ?

Normally yes. But the authors here are from the US NIH. And works of the US government are in the public domain. So the images are in the public domain! And here they are – how malaria gets into a cell:

Fig. 1

Typical SEM images contrasting the surface topography of noninfected (a) and P. gallinaceum-infected erythrocytes (b, c). Noninfected erythrocytes have a smooth surface. In contrast, the furrow-like surface structures are seen on infected erythrocytes. Bars in (a) and (b) represent 1 μm, in (c) 200 nm.


If I'm wrong my quarrel is with the NIH, not Elsevier. If the NIH have handed the total copyright of these images to Elsevier then I'll scrub this blog post.

If I'm not wrong, then these images can be aggregated into @ccess. And avaliable for anyone who wants them, for example:

  • Writing a lecture
  • Writing a textbook
  • Educating people infected with malaria to show the science going into the problem
  • Re-used as compoents in artistic works,

And so on.

Now it's possible that I have run foul of Pubmed rules. That I can't even re-use public domain works in Pubmed. If so, Pubmed will tell me. And they'll tell me that THEY don't make the rules – the publishers do.

Let's see.

But in any case there is masses of stuff we can all put into @ccess, that will enhance the information available to the human race. And we all want that, don't we?

NOTE: I took the photograph of the photograph of the little owl. I might have broken copyright as I died 20 years ago. But somehow I think he and his heirs will approve of what I have done.

NOTE: I can't reproduce Alice H-W's report on the Little Owl as, she died in 1944 and Wiley wants 30-40 USD for me to read it for ONE day. (Except I have it in my bedroom)






@ccess for everyone. A new initiative in open Scholarship

We have started a really new exciting venture in making scholarship available to everyone. We're starting from scratch. We're still working out details. And "we" means "you".

About 3 weeks ago things came to a head. Many people are frustrated with the lack of real, 21stCentury, access to scholarship. To the outputs of funded scholarship (somewhere between 300 BILLION USD and 1000 BILLION USD). And to the feeling of exclusion that everyone who isn't a powerful academic feels. The inability to contribute. The feeling that scientific research is a spectator sport for the lucky few who are in rich universities. Some of us swapped emails, initially from a sense of frustration.

But what emerged after about a week was the sense that we had exciting new opportunities to change the world in a bottom-up manner. We are increasingly empowered by the public technology of the web, and we are building on top of it. So we are creating a project, a philosophy, a toolkit, and collections of content. It's happening under the aegis of the Open Knowledge Foundation (OKFN) (mailing lists at: ) as a new project (open-access ) and overlaps with open-bibliography and open-science.

When we use "open" we are committed to the Open Knowledge Definition: (

"A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike."

This applied to several aspects of Open Scholarship – Open Access, Open Bibliography (BOAI-compliant), Open Citations and Open Data (OABCD) as a start. If we have OKD-conformant information then, for the first time, we can start to see the power of machines and humans working together. We can use automation without having to seek permission.

We're currently calling it @ccess .

@ccess for all.

Why this strange formation? Because it's simple, memorable, and searchable on the web. It avoids the overloading of "Open". "Open Access" and more generally "Open Foo" is not an effective label for OKD-compliant information. By using a clearly defined string we can label information as OKD-Open, as BOAI-open. (Unfortunately even after 10 years of BOAI there is no simple automatic way of telling that a piece of information is re-usable without asking permission).

So what information can we AUTOMATICALLY label as @ccess – OKD-open? Not very much yet, but we expect that this will grow rapidly. Here's what we can do:

  • Anything specifically labelled with CC-BY, CC0, PDDL licences.
  • Datasets in CKAN labelled as Open
  • Articles from BOAI-compliant publishers (the main ones being BMC, PLoS, EGU, and a few others)
  • Data from bioscience databases (e.g. genomes, protein structures) (Bioscientists don't normally use licences but adhere to the Bermuda principles)

Here are some things that, by default, are not AUTOMATICALLY recognisable as OKD-compliant

  • Depositions in Institutional repositories. Almost no content is labelled usefully for machines
  • Self-archived manuscripts including arXiv
  • Bibliographic collections
  • Contents of Pubmed (except as above)
  • Hybrid publications (95% is NOT OKD-compliant)

So a major problem is that we don't know what is actually OKD-Open and what we can use for modern automated scholarship. @ccess aims to change that.

We're going to build collections of OKD-Open material and label it as such. To show that's it's useful and a new approach to scholarship. Open to everyone, not just academics. Because @ccess is bidirectional – it's about building our principles and community so that we have a say in modern scholarship.

We're using our new bibliographic tools – Bibserver and Bibsoup – as an efficient means of collecting the information and labelling it. We're starting with disease as there are already active communities who want to start using the tools. Our first project is based on MALARIA. This idea been brewing for a year or more – Open Research Reports – but we've had to wait while we developed the technology. We've now got this, and we can collect and, very soon, label and annotate the information.

We are very grateful to Tom Olijhoek, Bart Knols and MalariaWorld for acting as the centre of this. Our first task is to find what Open information there is. This will require considerable human effort. Even Pubmed isn't able to label the documents which are OKD-Open.

We'll be posting more about this – so far we have 70,000 references for Malaria keywords ("Malaria", "Plasmodium", etc.) Mark MacGillvray has ingested them into a Bibsoup and we'll be working on them. The first activity will be to find out how many are OKD-Open. I'm guessing about 3%. That's the sad face of access to scholarly information. That's the amount that we can legitimately text-mine, re-index, use graphics from, etc.

But as people see the value of this they'll want more. And that's an important driver for making more information OKD-Open and labelling it.