Molecules? Does "Open Access" help or hinder Open Science?

"Open Access" is often taken to imply certain rights. In fact it is more frequently a fuzzy term whose precise interpretation is unclear and sometimes even counterproductive to Open Science. (I accept this is a provocative statement, so read on...:-).

Molbank, published by Molecuar Diversity Preservation International, is one of the oldest of a handful of Open Access journals in chemistry. Although its longevity is a remarkable accomplishment in itself, there is much more to Molbank than meets eye. Just below the surface is a feature so revolutionary, yet simple, that chemistry publishers years from now will wonder why they didn't implement it sooner.A Molbank article consists of a short monograph on a single compound, or possibly two. This may strike some scientists as a strange way to publish results, and it is unusual. On the other hand, this system offers vast potential to capture useful, but "unpublishable" findings that would otherwise be lost. Back when scientists actually read hardcopy journals, such a system would never have been feasible. Today, with hard drive space measured in terabytes, fiber optics cables crisscrossing the planet, Internet connectivity for almost everyone, and servers that can be had for virtually nothing, this system not only looks perfectly feasible, but preferable in many ways to the status quo.

Here's the revolutionary part: each article that Molbank publishes is accompanied by a publicly-available, machine-readable file encoding the structure of the article's subject molecule. That's it. There's nothing tricky or high-tech about it. In fact, the practice is about as low-tech as you could imagine. The file format in which structures are encoded, molfile, dates back at least fifteen years, and nearly every piece of chemistry software - both end-user and developer tools - can handle it. What makes Molbank's practice revolutionary is that not a single chemistry journal, Open Access or subscription-based, currently does this.

Why does the simple inclusion of a publicly-available molfile encoding molecular structures in a paper matter so much? This is where the second two entities of the trinity named in this article's title come into play: Open Source and Open Data. By providing a mechanism for a computer to decipher the chemistry in a paper, Molbank has opened the door to a host of highly-productive integration activities that nobody outside of Chemical Abstract Service has even been able to contemplate, let alone prepare for.

This article is the first in a series aimed at exploring the wide-open space that Molbank has created. Rather than arguing my point with words, I'll actually build working demonstrations of what is now easily within reach. At the same time, I'll document my work on this blog. I'm not sure where all of this will end up, but I do hope to shine some light on a vital, although currently obscure, component of the Open Access debate.

Rich is absolutely right about the potential value. My concern is that the "Open Access" claimed here is actually counterproductive to Open Science and what he and I want to do. I hope that the Open Access community can address this.
Molbank is potentially a new and valuable extension towards the idea of publishing data as Rich describes. It's similar to journals like Acta Crystallographica E which publish a single crystal structure per article with full associated data.
Molbank was founded with, I believe, a grant to develop Open Access. But the papers themselves, although openly accessible are copyright MDPI:
  • Copyright of published papers. We will typically insert the following note at the end of the paper: © 200... by MDPI ( Reproduction is permitted for noncommercial purposes. For alternate arrangements concerning copyright please contact the Editor-in-Chief.

and it has some form of "differential Open Access":

  • Important additional information: All thematic special issues will be fully Open Access with publishing fees paid by authors. Open Access (unlimited access by readers) increases publicity and promotes more frequent citations as indicated by several studies. More information is available at

and from the copyright transfer form:

The copyright to this article is hereby transferred to MDPI, effective if and when the article is accepted for publication.The copyright transfer covers the exclusive right to reproduce and distribute the article, including reprints, translations, photographic reproductions, microform, electronic form (offline, online) or any other reproductions of similar nature. In the case of a Work prepared under US Government contract, the US Government may reproduce, royalty-free, all or portions of the Work, for official USGovernment purposes only, if the US government contract so requires.The author warrants that his contribution is original and that he has full power to make this grant. The author signs for and accepts responsibility for releasing this material on behalf of any and all Coauthors.
The undersigned author, as corresponding co-author of the Work, states that all co-authors have been made aware that this manuscript has been submitted to this journal, that they have or will be provided with a (electronic) copy of the manuscript, that they have consented to be co-authors of the manuscript and to transfer the copyright.

In my view this is absolutely NOT open access according to the Budapest Open Access Initiative which reads (with my italics):
The literature that should be freely accessible online is that which scholars give to the world without expectation of payment. Primarily, this category encompasses their peer-reviewed journal articles, but it also includes any unreviewed preprints that they might wish to put online for comment or to alert colleagues to important research findings. There are many degrees and kinds of wider and easier access to this literature. By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
The rubric from MDPI is clear. It is NOT BOAI-compliant. (I have corresponded some years ago with the Editor but didn't get a substantive reply on this issue):
  • BOAI permits commercial re-use; MDPI does not.
  • BOAI permits non-exclusivity of copying; MDPI does not
  • BOAI permits automatic crawling of data; MDPI gives no explicit permission
  • BOAI acknowledges the value of copyright to the authors; MDPI requires the authors to surrender this.
By contrast journals such as Beilstein Journal of Organic Chemistry explicitly states:

Brief summary of what Open Access means for the reader:

Articles with this logo are immediately and permanently available online. Unrestricted use, distribution and reproduction in any medium is permitted, provided the article is properly cited. See our open access charter.

Anyone is free:

  • to copy, distribute, and display the work;
  • to make derivative works;
  • to make commercial use of the work;

Under the following conditions: Attribution

  • the original author must be given credit;
  • for any reuse or distribution, it must be made clear to others what the license terms of this work are;
  • any of these conditions can be waived if the authors gives permission.

Statutory fair use and other rights are in no way affected by the above.

Without an EXPLICIT machine-readable statement of the sort above "Open Access" is effectively useless for Open Science. Remember that we increasingly want to use machines to trawl sites. If I knew I had permission I would set our robots over the whole of MDPI tomorrow. (I am probably allowed to extract all the molecular files as they are (IMO) "data" unless the grotesque sui generis database restriction applies.

Open Science cannot make effective use of:

  • author self-archiving. Much self-archiving - whether on websites or repositories - will not be accompanied by licenses of the sort above.
  • journals that do not assign copyright to the authors AND do not explicitly allow crawling of the publishers site AND do not provide machine-readable licenses. How many hybrid journals do that?

I would recommend the use of the phrase

"Open Access(BOAI)"

If publishers adopted something like that it would solve my problems. It's simple. However I guess that an increasing number of publishers are likely to let fuzz and FUD drift around their sites, especially those who have been dragged unwillingly into the "a few authors pay so we are Open Access". We hear encouraging figures about the growth of Open Access journals....

... but how many of these are explicitly BOAI-compliant?

  1. rich apodaca says:

    Hi Peter,

    A very thoughtful article, and you raise valid points. The terminology is rather loose. This reminds me of the "free as in speech" vs. "free as in beer" debate in Open Source. Molbank is "free as in beer" for the moment.

    So is PubChem:

    Does PubChem hinder the spread of Open Data with the murky chain of title and copyright status of its contents? Maybe. But as I wrote in the article above, I'm never going to refuse free beer. And expecially during a drought.

    Regarding the last part of your article on the right to crawl a site. Few Web sites explicitly allow this, yet they are crawled anyway. A site owner can indicate their desire *not* to be crawled with a robots.txt file.

    MDPI's site has no robots.txt file that I could find. It is therefore, by convention, open to all robots.

    Even better, MDPI maintains an explicit policy permitting the mirroring of their content:

    It does seem, from the date on their policy on Mirroring , that MDPI has been doing open access the way they do it since before the BOAI was even written. In my mind, that does give them a certain amount of freedom in their definitions.

    I'm also intrigued by BJOC's current stance on Open Access. If memory serves, 6-12 months ago I carefully examined their Open Access policy. It was pretty clearly stated that any attempt to create a database of articles that would compete with the BJOC site itself would not be acceptable use. I can't find any trace of that language today, so I would see this as a positive development.

    What I'd most like to see from an Open Access publisher is a legally-binding license to their content. A policy page is simply too fluid to do the job. A statement from the BOAI with every statement prefaced by "should" also won't work. Ideally, I'd know that now and forever more I can do x,y, and z with the content of a journal.

    An Open Access License. Does anybody do this? If not, why not?


  2. Bill says:

    Rich, as far as I've been able to tell, nobody but PLoS, BioMed Central and Hindawi really do this -- they use Creative Commons licenses, or something very similar. But they don't provide open data in the sense that Peter means it, and if they did it is not clear to what extent that data would be covered by their existing licenses. Once you move away from things that can be preserved in a pdf file (which, since you know Peter, you'll know isn't much!), licensing seems to get very murky very fast. I give a vastly inadequate introduction to the problem here, and some references in the comments of that post.

  3. Chris Rusbridge says:

    I'm confused! Peter says "BOAI permits commercial re-use; MDPI does not." But his article is licensed under a Creative Commons Attribution Non-commercial licence, so it too fails the BOAI test! And I'm required to submit to that licence in adding my comments (I'm happy to, by the way).

    There are a couple of issues here. One is that sites often have stupid licence terms, that say things they do not mean; I suspect that the site might change its terms if it was asked carefully and often enough. As a digression but illustration, I have an email somewhere from many years ago, in which I wrote to Elsevier about the copyright notice on their web site, which then read (my emphasis) "All rights reserved. No part of this service may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the publisher." Asking for written permission before downloading a web page (which you had to download before you could read the notice, was a bit, well, duhhh! I asked for that written notice, which never arrived, and it took several months for the notice to get changed.

    Another issue is that just because the default licence does not allow commercial re-use, does not mean that commercial re-use is forbidden! It just means that commercial re-use may be subject to a separate licence.

    Maybe the real bugbear is that they still ask for a copyright transfer. I really object to this, and just don't do it. It is (I understand) uncommon practice in book publishing anyway.

  4. pm286 says:


    I’m confused! Peter says “BOAI permits commercial re-use; MDPI does not.” But his article is licensed under a Creative Commons Attribution Non-commercial licence, so it too fails the BOAI test! And I’m required to submit to that licence in adding my comments (I’m happy to, by the way).

    Thanks Chris. You are correct in our license - it's a first draft and I am always open to persuasion. We discussed this in the context that (a) if we put no license then we could not re-use any of the comments and (b) if we used a CC-commercial license then contributors might not feel they could contribute. You're the first to raise this. If - generally - contributors feel a laxer license would be OK, we'll talk about it. Are you effectively saying that only Creative Commons Attribution + Derivatives + Commercial licence fits BOAI? If so, that's useful.

    This is all so important I shall blog it fully...

  6. David Goodman says:

    excerpted and adapted from my Dec 18 posting on the SPARC-OA list:

    As an example, relevant to this very posting, I quote from this blog:
    "You own the copyright in your comments: but you also agree to license your comments under the the Creative Commons Attribution-NonCommercial license."
    When you act as a publisher, you do not license commercial use.

    David Goodman,

