Why Open Access metrics are necessary

My recent activity into Open Access practice is motivated by making sure that things are done properly rather than a desire to bash publishers or campaign against lower prices. I’m not an expert here – I don’t know what the cost of OA is and I don’t know what a reasonable price is – but I am fairly sure that most funders are not getting what they think they are in OA.
I was contacted yesterday by someone who runs a voluntary disease group. (They may post in a few days – until then I shan’t say whom). They wrote “So, your blog entry was extremely useful here and I wanted to thank you.” Well thanks in return. And I was pointed to the following blog and comment which shows that people need this information:

PlogPaul Wicks blog

Paul Wicks blog = plog. I’m a postdoc at King’s College London, a research psychologist by training. I’m also involved in the National Research Staff Assocation, run a magazine called GRAD Britain for PhD students, and work for a “Web 2.0” company.

  • Is Publisher-lead “open access” a swindle?

    Date:
    Saturday, 14 Jul 2007 – 13:53 GMT
    I was alerted to this post by a patient advocate I know via PatientsLikeMe. The short version is that some of the articles have clearly been paid for to be open-access ($900 USD), but when you look at a recent copy of the journal you are prompted to enter a user-name and password as if you were paying for the article like normal, and in fact the buttons to pay for it remain there.
    It’s a similar case for a journal run by the American Chemical Society.
    Is this part of a wider problem? Petermr’s blog would certainly suggest so. He’s very involved in the open-access movement and is even starting to grade the different publishers on their clarity and accessibility.
    This is particularly relevant for me at the moment as I just got something published in a Blackwell journal and was considering paying a £1,300 fee to make it open access. In actual fact I think I can archive the pre-review version for free with my institutional archive, which is handy as I don’t exactly have £1,300 stuffed into my desk drawers in £20 bills…
    Anyway. The current state of affairs seems to be this: publishers are worried about OA and have cobbled together business models that support generating revenue in other ways that the typical subscriber model. However, they don’t appear to have put much thought in to the publishing model.

PMR: and 1 comment:

Jennifer Rohn said:
The two dedicated open-access publishers (BioMed Central and Public Library of Science) don’t have these problems. People who want to ensure their articles are truly going to be open access, published by companies who have put real thought into the publishing as well as business model, might want to look there.

PMR:  Thanks Jennifer :

I am a post-doctoral cell biologist at University College London, having just returned to science after a four-year sabbatical as a journal editor. In my spare time, I am also a freelance science writer, editor and journalist; novelist; biotechnology consultant and the founder and editor of LabLit.com magazine. You can find me at most London sci/art/culture/lit events!

PMR: I hope that as we in the Blue Obelisk do our studies it will become objectively clear which author-pays publishers are truly open access and which are not really trying. We’ll let the facts speak.

Posted in open issues, Uncategorized | 2 Comments

Has anything changed in three years?

Last week I had a long conversation with a representative of UK local government with responsilibility for Brussels. They were interested in Open Access and I spent about an hour explaining it from the bottom up. It can take that long to get across the “why”, rather than just the “what”. And we discussed the “how” as well – what should be done. I won’t pre-empt any public info on that. As part of the background I pointed them at the UK Select Committee’s report into Open Access – which I was surprised to see now has its third anniversary. You might think that it’s boring – it’s not. The conversation is recorded verbatim and the chair – Ian Gibson, who was an academic before becoming a politician – is not mealy-mouthed (and often entertaining) when talking to commercial publishers. Yet it’s fair [PM’Rs emphasis]:

Here is the press release on the UK inquiry into scientific publication. It represents a significant endorsement of open access.
The UK’s Select Committee on Science and Technology final report on “Scientific Publications: Free for all?” is now available.
Links to the summary (http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/39903.htm)
and the whole report
(http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/39902.htm).
SCIENCE AND TECHNOLOGY COMMITTEE HOUSE OF COMMONS
PRESS NOTICE
PUBLICATION OF REPORT SCIENTIFIC PUBLICATIONS: FREE FOR ALL?
The Science and Technology Committee today publishes its Tenth Report of Session 2003-04, Scientific Publications: Free for all? (HC 399-I).
The Committee concludes that the current model for scientific publishing is unsatisfactory. An increase in the volume of research output, rising prices and static library budgets mean that libraries are struggling to purchase subscriptions to all the scientific journals needed by their users.
The Report recommends that all UK higher education institutions establish institutional repositories on which their published output can be stored and from which it can be read, free of charge, online. It also recommends that Research Councils and other Government funders mandate their funded researchers to deposit a copy of all of their articles in this way.
The Committee concludes that the creation of institutional repositories is an important first step towards a more radical change in the way that scientific papers are published. Early indications suggest that the author-pays publishing model could be viable and the Committee remains unconvinced by many of the arguments mounted against it.
Nonetheless, this Report concludes that further experimentation is necessary, particularlyto establish the impact that a change of publishing models would have on learnedsocieties and in respect of the “free rider” problem. In order to encourage such experimentation the Report recommends that the Research Councils each establish a fund to which their funded researchers can apply should they wish to pay to publish.
The Report criticises the UK Government for failing to respond to issues surrounding scientific publications in a coherent manner. The Committee is not convinced that it would be ready to deal with any changes to the publishing model and calls for the formulation of a strategy as a matter of urgency.
The preservation of digital material is an expensive process that poses a significant technical challenge. The Report recommends that the British Library receives sufficient funding to enable it to carry out this work. Government needs to start work on new regulations for the legal deposit of non-print publications immediately.
The market for scientific publications is international. The UK cannot act alone. For this reason the Committee recommends that the UK Government act as a proponent for change on the international stage and lead by example. This will ultimately benefit researchers across the globe.
Chairman of the Committee, Dr Ian Gibson, said “Publishers are feathering their nests with big profits whilst scientific journals are becoming less and less affordable. Government has its head in the sand: it’s about time that it landed in the in-tray of the Ministers in question. Instead of bashing all the alternatives, commercial publishers should be asked to justify the current publishing process they use. The Open Access movement needs to iron out the teething problems with the author-pays model. It’s public money that oils the cogs of the publishing machine and we want to make sure that it’s
well spent.”
3. The Committee took evidence from Blackwell Publishing, John Wiley & Sons, Nature Publishing Group and Reed Elsevier on 1 March 2004; Oxford University Press, the Institute of Physics Publishing, the Association of Learned and Professional Society Publishers, BioMed Central, Public Library of Science and Axiope on 8 March 2004; the British Library, the Joint Information Systems Committee, Cambridge University Library, the University of Hertfordshire and a panel of academics on 21 April 2004; and the Department of Trade and Industry/the Office of Science and Technology, the Higher Education Funding Council for England and Research Councils UK on 5 May 2004.

PMR: The follow-up – which is disappointing – is chronicled in Peter Suber’s blog, Stevan Harnad’s list and elsewhere. There is a strong publisher lobby (they hire PR firms to discredit OA) and the UK government (in the form of Lord Sainsbury, Minister for Science) was not convinced and suggested a “level playing field”. This charming term, redolent of summer cricket pitches, is a typical British approach for doing nothing and muddling through.
I gather that the UK’s policy for Open Access in Europe is still to have a “level playing field”. I suggest we move the goalposts.

Posted in open issues | Leave a comment

"Open Access" at libertas academica

As I have mentioned a group of Blue Obelisk volunteers are surveying the practice of “open access” in chemistry. We’ve created a wiki and will be exposing the work as we do it – url follows when we have tidied it. We are partitioning the work into chunks for each author and I volunteered to be the first – I got Analytical Chemistry Insights published by libertas academica.
It is clear that determining what “OA compliance” means is more difficult than we originally thought. In many cases it is determined by the publisher, sometimes the journal and sometimes individual article. libertas academica (la) describes itself as “A leading publisher of Open Access journals” .
[Note: I have been disappointed with the support for “open access” in closed-access publishers and been critical of their presentation, language, logic, consistency and much more. I shall try to apply equal rigour to “open access” publishers – i.e. not pull any punches.]
I have never encountered la and have no preconceptions as to whether they espouse OA as fully as I would like. So here I take you through a (typical?) journey through their pages.
The terminology may or may not be important. They do not describe themselves as an Open Access publisher, but a publisher of Open Access journals. The difference matters – a publisher may publish both Open and Closed Access journals. And a Closed Access journal may yet contain Open Access articles (c.f. Springer Open Choice, see recent posts). The reader may find it difficult to work out what is going on. Sometimes they have to refer back from an article to the issue masthead, sometimes to the journal masthead and sometimes to the publisher.
So this is my analysis. The journal home page looks like:
==============================================================

Search

 
 

Latest articles

Polymeric Nanoparticles, Nanospheres and Nanocapsules, for Cutaneous Application

============
So I go to About us and find:
==================================================================
Who we are
Libertas Academica is a family-run business based in the city of Auckland, New Zealand. It was established in late 2004. The name of the company, roughly translated, means “freedom to scholars”.
What we do
We are primarily publishers of open access journals in the scientific, technical, and medical areas. Further information on what we do is available here.
Copyright © 2006 Libertas Academica Ltd. All rights reserved
==================================================================
Under Services we navigate to a page on open access and you need to read this carefully (with my comments):
==================================================================

Open Access Journals

Open access journals and us

Libertas Academica is primarily a publisher of open access (“OA”) journals. OA journals are freely available to readers through the world-wide web without copyright or licensing restrictions or fees.

PMR: Note the term “OA journal”. This term is widely used but is not required by the BBB declarations. In general use it means a journal in which all the articles are necessarily (e.g. BOAJ uses:

LA: Open Access Journal:
We define open access journals as journals that use a funding model that does not charge readers or their institutions for access. From the BOAI definition [1] of “open access” we take the right of users to “read, download, copy, distribute, print, search, or link to the full texts of these articles” as mandatory for a journal to be included in the directory.
[1] http://www.earlham.edu/~peters/fos/boaifaq.htm#openaccess

LA: What is OA?

OA removes the price and permission barriers from free access to scientific research:

  • No subscriptions, licencing or pay-per-view fees;
  • No copyright restrictions.

Other OA publishers may apply slightly different terms. The Budapest Open Access Initiative explains this:

There are many degrees and kinds of wider and easier access to this literature. By ‘open access’ to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.

The Bethesda and Berlin statements also comment on this point. For a work to be OA the copyright holder must consent to let readers:

copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship…

Collectively these three constitute the core definition of OA. However, OA journals are also required to provide immediate full-text access to published work rather than just abstracts or article metadata.

PMR: So far, so good.

[…]

The primary difference between OA publishing and non-OA publishing is that the costs associated with publishing the journal are paid by the authors rather than the readers and hence do not act as barriers to access.

PMR: I disagree. There are several hybrid “OA” offerings where costs are paid by the authors but which fail to remove permission barriers. This will be a concern later on.

LA: OA publishing and copyright

There are two key aspects to how copyright relates to OA publishing.

The copyright holders (the authors) consent to unrestricted reading, downloading, copyrighting, sharing, storing, printing, searching and linking to the full text of the work. LA’s licence prevents misattribution and selective reuse to prevent plagiarism, misrepresentation and questionable scholarship.

Where an author has re-used content not in the public domain in an OA work, the consent of the copyright holder must be given.

Therefore, we can say that OA publishing is not comparable to peer-to-peer file sharing for science, and OA publishing is always voluntary.

In its conventional form OA publishing is also royalty-free. Authors effectively give their work to the world without expectation of payment. We believe that in the future it may be possible to publish text books in a manner resembling OA but with royalty payments made to the authors.

[…]
Beneficiaries of OA publishing
[…]

Readers gain barrier-free access to research material without library-imposed restrictions. OA published material is accessible from anywhere that a connection to the Internet is available via high-availability research tools such as Google and Entrez Pubmed, and this material is also freely applied to current and future indexing, mining, summarising, translating, querying, linking, recommending, alerting and aggregation tools, and other forms of data processing and analysis.

Teachers and their students gain free and equal access to content and negates the need for permissions to reproduce. Sharing material is simplified and free.

[…]

All citizens gain by having access to peer-reviewed research which is generally not offered by public libraries. Access to such material provides citizens with a counter-point to questionable statements made in less credible sources. It also gives them access to research funded by their taxes. Under conventional publishing, publishers are effectively given exclusive rights to profit from research paid for by tax payers.

 

The only restriction placed on the use of our articles is that they may not be exploited for commercial purposes. Please see the copyright page for further information on this.

PMR: The “copyright page” has no link, In other cases in the la pages where there is a link it gives 404 Not Found. But I managed to find:

Information for Authors

Statement of copyright
This applies to all articles published by Libertas Academica unless expressly stated otherwise.
Anyone is free:

  • To copy, distribute, and display the work;
  • To make derivative works;
  • To make non-commercial use of the work;

Under the following conditions:

  • The original author must be given credit;
  • For any reuse or distribution, it must be made clear to others what the license terms of this work are;
  • Any of these conditions can be waived if the authors give permission.

Statutory fair use and other rights are in no way affected by the above.
This is revision 1.0 of the statement of copyright, published on 22 December 2004.
Statement of rights of the publisher
Without prejudice to the rights set out in the statement of copyright, the publisher reserves the right to:

  • Produce printed versions of journals for subscribers without restricting open access to those journals.
  • Generally make commercial use of journals without restricting open access to those journals.

This is revision 1.0 of the statement of rights of the publisher, published on 14 December 2004.

PMR: This appears to be a statement of a licence. “Licence” is used elsewhere without explicit dereferencing. Note that the publisher reserves the exclusive right to use the work for commercial purposes. The “non-commercial” use is common in “open access” licenses. Commercial use is not explicitly (or IMO implicitly) restricted by the BBB declarations. Therefore I hold this licence to be incompatible with BBB.

 

Now let’s look at the practice: an article in Analytical Chemistry Insights.

  • The TOC says nothing about Open Access (although the masthead has “a leading publisher of OA journals”). It carries the rubric:Copyright © 2006 Libertas Academica Ltd. All rights reserved
  • The abstract (of the first article) says nothing about Open Access (although it carries the LA description). It carries the footer:Copyright © 2006 Libertas Academica Ltd. All rights reserved
  • The article says nothing about Open Access and does not have a masthead. It carries the rubric: Correspondence: Cheng Bai, Ph. D., Tel: (478) 329-0770; Fax: (478) 956-2929; Email: cbai2001@yahoo.com
    Please note that this article may not be used for commercial purposes. For further information please refer to the copyright statement at http://www.la-press.com/copyright.htm [PMR: this link does not resolves but I take it to be the Instruction to Authors above].

So the publisher holds the copyright to the abstract and there is no explicit copyright on the paper. If the link resolved I doubt it would clarify the position for an average reader.
In summary. This is confused. The publisher does not regard permission barriers as an essential part of OA, although they don’t copyright the article (only the abstract). They clearly understand the BBB declarations but choose to interpret them differently from me (and I suspect most OA experts).
Recommendations to publisher

  • Make sure that copyright holders are clearly identified
  • Do not assert copyright on the abstract
  • Create a clear licence for use and re-use. This license should indicate that the document and meta-documents are Open Access.
  • Attach the license or its address to every document (TOC, abstract, article, supplemental data)
  • Choose a CC license unless there are clear reasons not to.
  • Choose CC-BY. Be brave.
Posted in chemistry, open issues | 3 Comments

No more hamburgers!?

From HUBLOG (who I think is Alf Eaton from Nature) we have:

EPUB and Adobe Digital EditionsBill McCoy from Adobe came in to give us a talk yesterday, the main part of which was a demo of Adobe Digital Editions and an overview of the document packaging standard EPUB (aka IDPF OPS/OCF).
Digital Editions is alpha-quality software that I really wish was written in XUL and used the Gecko rendering engine and was extensible using XPI plugins, but isn’t (as far as I know). It’s going to be cross-platform, with a Linux version planned for later this year, which is good. It’s also full of quite cryptic error messages and likes to hang, particularly when XML documents aren’t completely valid. Its main use, though, is as a demonstration of EPUB, which is wonderful.
An EPUB file is something I’ve been looking for for ages: a zip file containing…

  • A mimetype file that describes what kind of package this is (so it’s not dependent on the file extension), and a META-INF/container.xml file, that describes what kind of file is used to describe the main content, and where to find that root description file.
  • An OEBPS folder that contains the main content, which is comprised of
    • A .opf file, which provides metadata for the document, a manifest of all the associated files, and an ordered list of the documents to display.
    • A .ncx file, which describes the table of contents.
    • .html and .css files, which are XHTML and CSS files the same as you’d use for the web and comprise the actual documents and associated stylesheets for rendering.

As is Adobe’s way with these things, there appears at first glance to be slightly too many, possibly redundant metadata files. If that’s necessary to support a wide variety of packaging and document formats though, then that’s fine – I’m glad that they’ve chosen to use open standards so that anyone can easily create and read these packages.
What Digital Editions does very nicely with these files, which web browsers don’t – but I suspect could, with a bit of Javascript – is to reflow content into multiple, paged columns depending on the screen width and font size. This makes it much easier to read documents on the screen as there’s less scrolling, and unlike PDFs you don’t have to scroll up and down to read each page. (Also unlike PDFs, it’s easy to extract the content from these XHTML files, so no more hamburgers. [PMR: my re-use of the metaphor that PDF->XML is O(hamburger->cow)] Adobe will support DRM in Digital Ediitons where necessary, so I imagine there may be some way to encrypt the content of documents, but Bill at least seems to have a good stance on not pushing DRM where it’s inappropriate).
However Digital Editions ends up, I really hope it lets you apply user stylesheets and user scripts to documents, otherwise we’re still better off viewing them in a web browser (which may well get native, or at least plugin support for EPUB before too long).

PMR: The Nature folks are among the most advanced techie people in science publishing so their views are worth listening to. And if HUBLOG isn’t them, then I apologize. But I’m not excited by this news
I see this from a perspective of 15 years of the broken browser, a history of bloated systems, manufacturer non-compliance, abandoned systems, and the failure of the W3C to carry any influence. Examples:
SVG. An open standard for graphics. We’ve spent a lot of time developing SVG for CML. We used to be able to do some really fun things that advanced chemistry. Adobe produced a lovely plugin (admittedly only for IE, but given the browser midden what else could be expected). OK – I had to put on my pages “only works with IE”. That’s one reason why I use Windows on my laptop – it’s the easiest way of getting SVG going. So where are Adobe now in SVG? They’ve abandoned the plugin;
Compound documents. The W3C has had an activity for years on how we package HTML files over the web. Where has it got to? So here we have a manufacturer offering a proprietary solution to compound documents.
So my problem is that Web technology is now dominated by large organisations who think in terms of large teams of programmers – at XTech I was interested in XForms in the browser – and got a reply that it was now quite practicable: “we cut our development team down from 10 programmers over 5 years to 5 programmers over 2 years”. I may have got the numbers wrong but the scale is roughly right. And for developing a client-side molecular browser…?
” is to reflow content into multiple, paged columns depending on the screen width and font size”. Is this what a molecular biologist wants when reading a sequence? I don’t know of one scientist who asked a publisher for multiple column PDF files. Yet this is what is forced on us. And here we have a product with built-in DRM and encryption. Is the scientific community crying out for DRM (== digital rights management, technology to stop you reading things).
So, if anyone out there is listening, I want a browser that supports SVG out-of-the box. I want open standards, not proprietary tools which embrace, extend and drive-me-up-the-wall-when-they-disappear. I want an OPEN compound document format. Like SWORD. Like ORE.
Oh – and have I mentioned that it would be nice to send chemistry over the web in XML? Not gifs, not PDFs, not whatever-the-next-glitzy-proprietary is. Just CML.

Posted in Uncategorized | 4 Comments

Open Data genetics and Open Data astronomy

The world is exploding with Open Data posts…
From the Data Strategy blog:

I started writing this post on open data astronomy some time ago, and damn… I got scooped by Read/WriteWeb today with their article on Galaxy Zoo and other “distributed brain” projects. Galaxy Zoo, like Stardust@Home and Clickworkers, asks volunteers over the Web to label astronomy features (galaxies, moon craters, etc.) on images. These data help astronomers do better analysis. I actually had referenced Clickworkers for my PhD thesis under the Open Mind Initiative.
Also see my previous post on open data genetics.

PMR: which reads:

Open data genetics

I had just read about the Personal Genome Project (PGP) a couple days ago, and it’s a really interesting open data project. According to its Wikipedia entry:

The project will publish the genotype (the full DNA sequence of all 46 chromosomes) of the volunteers, along with extensive information about their phenotype: medical records, various measurements, MRI images, etc. All data will be freely available over the Internet, so that researchers can test various hypotheses about the relationship between genotype and phenotype.

The published data will include identifyable information such as the volunteers’ name. The reason for doing so is that they can’t guarantee anonymity anyways when one’s genotype and phenotype are already open. In an interview in Technology Review, the project’s founder, Harvard University’s George Church, said:

We and others have raised concerns about the difficulty of maintaining anonymity [in medical records]. You promise subjects you will make the information anonymous, but it’s becoming increasingly easy to re-identify an individual. This project will hopefully raise consciousness on what we need to do to encourage insurance companies and government and employers to make this safer. This has already been done in some countries, so it’s just a matter of policy.

The first volunteers will be tenured human geneticists, who best understand the risk and benefits of this project. Harvard Medical School’s Institutioal Review Board had given the project permission to start, and it sounds like they will review its progress before the project will recruit a broader set of volunteers.

So we have seen “Open Data Foo” as a useful and accurate term to define data-driven science in a discipline Foo. Obviously the detail will depend on the discipline – astronomy and genetics are very different – but the key features are:

  • The data must be absolutely Open (see Open Data on Wikipedia) – no licence restrictions, no need to ask permissions, no restrictions on what can be done with the data, no “non-commercial” restriction.
  • the science is data-driven (i.e. no foreseeable requirement to collect more experimental data). Obviously Science is not predictable and it may turn out that to answer a question a real-world experiment is needed. But that’s Science
  • The experiment and its interpretation take place in full public view. Ideally anyone can take part, though too many people can be difficult to control and management may be important. Genetics may require more organization because people are involved.

I’ve got very excited by this idea. It’s a wonderful way of communicating science. Volunteers can come from anywhere (and we’ve found this in the Blue Obelisk – not everyone is a chemical hacker). They’ll find out that science is hard, unforgiving, often gives no “results”.
So why don’t we offer our own CrystalEye (http://wwmm.ch.cam.ac.uk/crystaleye) as an Open Data Crystallography project? There’s a lot that anyone can do – you don’t have to be a crystallographer – hackers, visualizers, statisticians, RDFers, etc. – all could make an impact. I’ll blog about this in a day or so.

Posted in data, open issues | Leave a comment

More "potential reproducibility"

I’ve just blogged about a group (Open Data is critical for Reproducible Research) which is wanting to publish reproducible science. Quite by chance I then read  Bill Hooker reporting another effort to create reproducibility:
BillH: Openness is spreading, one researcher at a time: Jeremiah Faith, a Boston U graduate student in bioinformatics, has put his lab notes online:

Open Notebook Science […] is a term coined by Jean-Claude Bradley. The idea is simply that the heart of every person’s research – their lab notebook – should be open to the world.Since most of our scientific work is funded by tax payers who expect their money to be well-spent, it’s interesting that openness isn’t required. Science typically builds on the body of available knowledge – the more knowledge available the faster science goes. It’s striking when you visit other labs in person; you see all of their unpublished work, and you know that most of their results and data won’t be available to the bulk of the scientific community until a year after each particular scientific project is finished. By the time papers are in print, it’s old news to the insiders. More striking is when you visit labs whose work you’ve thought about replicating and expanding on. It’s not too uncommon to find that only one person in the entire lab is able to get the technique to work, and even for him the technique only works on Wednesdays. This type of information would be useful to know before you embark on a useless three months trying to adapt their method. But scientific publications are covered in a thick coat of high-gloss finish, making these unacknowledged difficulties hard to detect.
Lab notebooks on the other hand are flat black. As long as people keep them regularly updated, they contain the good, the bad, and the completely nonsensical results.
Today I test the waters of Open Notebook Science.
The latest version of my lab notebook is now automatically posted on J’s Lab Notebook Page each night. I’ve been using an electronic lab notebook for two years now, so there’s quite a bit of data in there – good and bad (300+ pages).

BillH: This is simply fantastic. One of the things that Open Science advocates most sorely lack is concrete examples. Doing research in public, instead of in secret, is a new and somewhat unnerving idea for most scientists; early adopters like Jeremiah are essential to take the edge off that unfamiliarity.(It’s also, to be honest, just plain fun to snoop around in someone else’s lab notes! I was amused to note that Jeremiah talks to and about himself in his notebook, the same way I do — “if I weren’t so stupid I’d…”, “next time load the control first, doofus”, etc. I wonder if everyone does that?)

PMR: This is simpy fantastic. An interesting thing is that this is zero-cost innovation. J-C coined the term “Open Notebook Science” after he had posted to the Open Data entry on Wikipedia and he and I had discussed possible terms. So after J-C blogged about it and set up his own site, Jeremiah used the idea to help think about, and promote, his own desire to communicate.
This is the power of the blogosphere – a good, well fashioned, idea can spread. Young scientists care about reproducibility. They are the ones that suffer from made-up, fuzzy, under-reported material. Do the lab heads and professors encourage reproducibility? I expect that most do but the pressures to publish in high-impact is too great to be ignored.
Jeremiah and J-C are gambling their scientific future on publishing their work openly. Some “high-impact” publishers would have knee-jerk reaction and refuse to consider manuscripts because the work had already been published. This would be rubbish. Any responsible science publisher should take very seriously the requirement to publish good science. Science that is created in an Open environment adds an extra dimension to “good”. It isn’t necessarily better, but it has less chance of being worse.
So, publishers, give us something to suggest that you are excited by these new developments. That you want to change science practice for the better. I’m guessing that Open Science (sensu J-C, not RSC) will help to promote the work better. And isn’t that what science publishing is about? Communication, not hiding.
And J-C and JF I can’t promise anything but I would bet that by gambling on Open Science you will be doing yourselves a lot of good. The world is waiting for this and will welcome you.

Posted in Uncategorized | 2 Comments

Open Data is critical for Reproducible Research

I have been contacted by a group in Lausanne working in the audiovisual area who want to publish their work so it can be reproduced. Here’s their outline and links to protocols:

Reproducible Research

In our lab, we try to make our research reproducible. This means that all the results from a paper can be reproduced with the code and data available online. A more detailed motivation why we believe this is important can be found here. We also give a detailed description of the procedure to follow when making your publication reproducible.

Reproducible papers from our lab

News

Join us in a discussion about the use of reproducible research and how to make it work on our reproducible research forum!
For the latest news on reproducible research, please have a look at our RR Blog!
We have organized a special session at ICASSP 2007 about reproducible research (co-organized with Mauro Barni and Fernando Perez-Gonzalez). We had great talks, an interested audience, and some good discussions!
[More info]

Motivation

After a colleague asked something about a paper you wrote, you spend a considerable amount of time finding back the right program files you used in that paper. Not to talk about the time to get back to the set of parameters used to produce that nice result.
Because this type of situations sounded all too familiar to many people of the lab, we are now trying to make our research reproducible. Most of the ideas about reproducible research come from Jon Claerbout and his research group at Stanford University. We believe reproducible can be helpful in many ways:

  • It will help us in the first place, to reproduce figures in the revisions of a paper, to create earlier results again in a later stage of our research, etc.
  • Other people who want to do research in the field can really start from the current state of the art, instead of spending months trying to figure out what was exactly done in a certain paper. It is much easier to take up someone else’s work if documented code is also available.
  • It highly simplifies the task of comparing a new method to existing methods. Results can be compared more easily, and one is also sure that the implementation is the correct one.

This may all sound very trivial, and in discussions with colleagues, there was a general agreement that this is how research should be performed. However, in practice, only few examples are available today. Making articles reproducible indeed requires a certain investment in time. However, we think that it is worth the investment. The interest is hard to quantify, but from download statistics and Google rankings, we can see that it really pays off!

How to make a paper reproducible?

Of course, it all starts with a good description of the theory, algorithm, or experiments in the paper. A block diagram or a pseudo-code description can do miracles! Once this is done, make a web page containing the following information:

  1. Title
  2. Authors (with links to the authors’ websites)
  3. Abstract
  4. Full reference of your paper, with current publication status, and a PDF of your paper
  5. All the code to reproduce all the results, images and tables. Make sure all the code is well documented, and that there is a readme file explaining how to execute it
  6. All the data (images, measurements, etc) to reproduce all the results, images and tables. Add a readme file explaining what the data represent
  7. A list of configurations on which you tested your code (software version, platform)
  8. An e-mail address that people can use for comments and remarks (and to report bugs)

Depending on the field in which you work, it can also be interesting to add the following (optional) information to the web page:

  1. Images (add their captions, so that people know what Figure xx is about)
  2. References (with abstracts)

For every link to a file, add its size between brackets. This allows people to skip large downloads if they are on a slow connection.For examples, see the list of reproducible papers above. Note that we are currently working on an automated setup using EPrints to simplify this process. Keep an eye on this webpage!

Other reproducible research

Reproducible electronic documents: Jon Claerbout and his colleagues at the Stanford Exploration Project initiated (to our knowledge) the discussions about reproducible research.
Wavelab: David Donoho and his colleagues at the Stanford Statistics Department developed Matlab code to reproduce their results on wavelets.
Sweave Demo: a demo of Sweave, a package to do literate programming and good documentation using the statistical software R, by Charlie Geyer.
Reproducible Neurophysiological Data Analysis: a page by Christophe Pouzat on reproducible research in neurophysiology using R and Sweave.

This is wonderful to hear. I’ve been blogging about Jean-Claude’s Useful Chemistry where he is trying to expose chemistry as it is done. And our own new BlueObelisk community project on analysing the Openness of data in chemistry. Realistically most groups will want to publish data and software retrospectively, and we need to get into the habit. However it’s a challenge. In an ideal world where the publishers were actively trying to help publish reproducible science it would be very difficult. With a large number of influential publishers working against the Open publication of data it’s harder than that – but not impossible.
So how to go forward? I think a useful way will be to create metrics for the reproducibility of the science in a given paper. A “reproducibility” score – rather like a “readability” or “accessibility” score. Obviously this is harder in laboratory subjects and even harder in observational ones, so let’s start with software, data and informatics.
In general bioinformatics experiments would have a high reproducibility score. The data are openly available in EBI, NCBI and elsewhere. Most of the programs are open source. It may not yet be possible for a reviewer and reader to “push this button to repeat study”, but it’s not inconceivable.
Computational linguistics is also a good discipline for reproducibility. You are required to make your corpus available, and your annotation scheme, and your software should be open.
Chemoinformatics is awful. You can use a set of molecules without specifying what they are in detail, use a commercial program (probably with irreproducible versioning) that most people can’t afford, use a non-portable machine-learning algorithm, and fail to deposit your protocol for selecting data points. As a result there is no check on anything other than the trustability of the humans involved.
So I suggest we use the term “potentially reproducible” to describe work which, in principle, a third party “reader” could reproduce with only the data and software described in the paper. We call anything else “irreproducible”, even if we believe it. I’d welcome guidance on this. Then we could create a “PRS” – “potentially reproducible score”.
I think that we would find many domains where people would value this. I know many people who have tried to reproduce garbage science. It can ruin their careers.
Before criticising publishers I invite them to take an active view on this. But it is difficult to reproduce an experiement which only a few hundred people can read, where the data are copyrighted and where you have to subscribe to an expensive databank.

Posted in Uncategorized | 1 Comment

Survey of "free to read" Chemistry in Wiley Publications

The Blue Obelisk community is undertaking a survey of access to chemical Open Data through Open Access, Fuzzy Open Access, and closed Access publications. Today I looked at Wiley’s offering. (Note although Blackwell and Wiley are now one company I think I only looked at the Wiley journals.)
If you are bursting to know how many “free to read” chemistry articles Wiley has published in 2007, skip to the end. But it’s worth reading their material first.
Wiley do not – AFAIK – have any completely Open Access chemistry journals. Instead they offer a sort-of-hybrid scheme called “Funded Access”. Here’s the home page for the scheme:

Funded access on Wiley InterScience

For those authors of primary research articles whose funding agency requires deposit of an article in an archive, Wiley offers the option of funded access. With this option, the author pays a fee to ensure that the article is made available to non-subscribers upon publication via Wiley InterScience, as well as delivered to the funding agency’s preferred archive when applicable. This access option is available only to authors of primary research articles.

The funded access option will be offered only to those authors whose articles have been accepted for publication, and only at the point when the article is accepted, to ensure that the funded access option has no influence on the peer review and acceptance process.Authors who order the funded access option for an article should make sure that the Article Alerts setting is turned on for the Track My Articles page within the Author Resources section. The alerts setting can be accessed in Track My Articles section under the My Profile page.For more information about funded access: Track My Articles screen under My Profile

Journals participating in funded access are [PMR I have selected only chemistry]:

Funded Access FAQs [PMR: only certain selected]

I have written for a journal that is not included in the list of participating journals. Can I order funded access?
This will be evaluated on a case-by-case basis in the initial phases of this experimental program. Please note that some journals do not offer this type of access as a matter of policy.
Will funded access articles be marked in WIS?
Yes. Funded access articles will be identified by means of an icon
Does the funded access program cover all funding agency archives, or only selected ones?
Wiley’s funded access program covers all funding agencies. It does not cover “institutional repositories”, or content repositories used for commercial purposes
How is the funded access program related to the NIH’s public acces policy?
Wiley’s funded access is a separate program, and is not related to the NIH’s public access policy. NIH grantees are eligible for Wiley’s funded access program.
For an article covered by funded access, can a copy of the article be deposited in the author’s institutional repository? Will Wiley provide a PDF to the author?
No, the funded access program does not provide for PDFs to be sent to authors, or for posting of articles in institutional repositories. As with all of our policies, these will be

[PMR: this is an interesting concept – the author writes a manuscript – pays the publisher to publish it who then refuses to give a copy to the author. A PDF is only a stream of bits. It costs zero.]
Unlike the ACS who, I believe, provides “Free Access” to any journal if the author pays, Wiley deliberately forbids “Funded Access” to Angewandte Chemie (it flagship chemistry journal) and to most of the other “high impact journals”. I doubt many chemists would rank the journals above in the front rank of chemistry.
Anyway I trawled manually through the 6 journals above, looking for “Funded Access” papers in chemistry. For comparison I found an issue of Proteomics which, with considerable excitement, pronounced:

First PROTEOMICS Funded Access Paper – for FREE!

funded_access.gifProteomics 2006, 6, 6400–6404

So I looked through all of the TOCs for all of the issues for all the 6 journals, and tried to spot a FUNDED ACCESS icon: Here are the results in increasing numbers of FUNDED ACCESS papers:

I may have miscounted somewhere so these figures are approximate but the sum of FUNDED ACCESS papers in chemistry published in Wiley journals appears, to a casual inspection, to be fairly close to ZERO.
But the scheme has only been going a year and I would expect the number to at least double next year.

Posted in chemistry, open issues, Uncategorized | Leave a comment

Thanks to Journal Info

I have just discovered Journal Info from the University of Lund which lists ca 18,000 journals. From the FAQ:

Real Answers to real questions about Journal Info.

What is the purpose of the service?
The purpose is to provide an aid for the researcher in the selection of journal for publication. The publication market has continuously grown more and more complex. It is important to weigh in facts like scope and quality, but more recently also information about reader availability and library cost. The Lund University Libraries have made an attempt to merge all there items into one tool, giving the researcher the power to make informed choices.
 
How many journals are a part of the service?
The service currently covers about 18,000 journals. It is discussed to add additional journals later this year.
 
 
 
 
What is the reason to the X’s and tick marks? What do they mean?
Some of the information is rather hard to interpret. Is a price high or low? Is a impact score reasonable or of concern? In an attempt to assist the researcher, we gather all the data for each general subject and give a green tick to the 50% best and a red cross to the 50% worse. We hope this can be a useful feature.
 
 
 

This is going to be very useful in our study of scientific journals and publishers. I would like to ask questions like “Does Wiley publish any chemistry journals which support hybrid OA?” [Hybrid OA is where the author of a paper pays the publisher so that non-subscribers can read the paper, whereas they cannot read the other papers in the journal]. JInfo lists all the journals that provide hybrid OA. That would be very useful if we could search the info.
JInfo also makes all this information available under CC-BY-NC. Excellent. Is it possible to download the whole database?

Posted in open issues, Uncategorized | Leave a comment

US citizens: please lobby for House vote on OA mandate next Tuesday

Peter Suber blogs: House vote on OA mandate next Tuesday  (Open Access News)

Yesterday when I posted the good news that the House Appropriations Committee had approved an OA mandate for the NIH, I didn’t have the exact language of the bill. Now I do:

Sec. 221: The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine’s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.

The new appropriations bill is scheduled to come up for a vote by the full House next Tuesday, July 17, and publishers are sure to lobby hard to delete this language. If you are a US citizen and support public access for publicly-funded research, it’s critically important to contact your representative and ask him/her to support this bill or at least to oppose any efforts to amend or strike the language on the OA mandate for the NIH. Contact your rep now, before you forget, and spread the word.  [My emphasis.]

This is very good news. The NIH has always tried to pursue Open Access and Open Data. For example the DTP branch of the National Cancer Institute published the NCI database of ca 250, 000 compounds and data which, for many years was the only source of Open chemistry data. The NIH has always had a battle to get this philosophy through the US political process so do not assume it will happen unless you lend your effort.
This is, of course, not the last battle. We have to raise the awareness of the importance of scientific data (Open Data) and so as soon as this is approved start campaigning that data must be made open alongside scientific publications. Without data a publication is a pale shadow of what it should be.

Posted in chemistry, data, open issues | Leave a comment