Scientists should NEVER use CC-NC. This explains why.

There is a really important article at (Hagedorn G et al)

[NOTE the OKF has a clear indication of the problems of CC-NC. They should add a link to Hagedorn. See my earlier blog post ].

So, you aren't interested in Biodiversity Journals? Never read Zookeys? (I didn't know it existed). But in 1 day about 1200 people have accessed this article. Yet another proof that WHAT you publishe matters, not WHERE. And hopefully this blog will send a few more that way.

I can't summarise all of it. The authors give a very detailed and, I assume, competent analysis of Copyright applied to scientific content (data, articles, software) and its licensability under Creative Commons. Note that "This work is published Under a Creative Commons Licence" – which so many people glibly use is almost useless. It really means "This work is copyrighted [unless it's CC0] and to find out whether you have any rights you will have to look at the licence". So please, always, specific WHAT CC licence you use.

The one you choose matters, because it applies the rule of LAW to your documents. If someone does something with them that is incompatible with the licence they have broken copyright law. For example combining a CC-NC-SA licence with CC-BY-SA licence is impossible without breaking the law.

There are so many misconceptions about NC. Many people think it's about showing that you want people to share your motivation. Motivation is irrelevant. The only thing that matters is whether the court assessing the use by the licensor breaks the formal non-commercial licence. There's little case law, but the Hagedorn paper argues that being a non-profit doesn't mean non-commercial. Recovering costs can be seen as commercial. And so on.

We came across this when we wished to distribute a corpus of 42 papers using in training OSCAR3. The corpus was made available by the Royal Society of Chemistry. It was used (with contributions from elsewhere) to tune the performance of OSCAR3 to chemistry journals. Because training with a corpus is a key part of computational linguistics we wished to distribute the corpus (it's probably less than 0.1% of the RSC's published material – it would hardly affect their sales). After several years they agreed, on the basis that the corpus would be licenced as CC-NC. I pointed out very clearly that CC-NC would mean we couldn't redistribute the corpus as a training resource (and that this was essential since others would wish to recalibrate OSCAR). Yes, they understood the implications. No they wouldn't change. They realised the problems it would cause downstream. So we cannot redistribute the corpus with OSCAR3. The science of textmining suffers again.

Why? If I understood correctly (and they can correct me if I have got it wrong) it was to prevent their competitors using the corpus. (The competitors includes other learned societies. )

I thought that learned societies existed to promote their discipline. To work to increase quality. To help generate communal resources for the better understanding and practice of the science. And chemistry really badly needs communal resources – it's fifteen years behind bioscience because of its restrictive practices. But I'm wrong. Competition against other learned societies is more important than promoting the quality of science.

Meanwhile Creative Commons is rethinking NC. They realise that it causes major problems. There are several plans (see Hagedorn paper):

Creative Commons is aware of the problems with NC licenses. Within the context of the upcoming version 4.0 of Creative Commons licenses (Peters 2011), it considers various options of reform (Linksvayer 2011b; Dobusch 2011):

• hiding the NC option from the license chooser in the future, thus formally retiring the NC condition

• dropping the BY-NC-SA and BY-NC-ND variant, leaving BY-NC the only non-commercial option

• rebranding NC licenses as something other than CC; perhaps moving to a "" domain as a bold statement

• clarifying the definition of NC

I'd support some of these (in combination) but not the last. Because while it is still available many people will use it on the basis that it's the honourable thing to do (I made this mistake on this blog). And others will use it deliberately to stop the full dissemination of content.

This entry was posted in Uncategorized. Bookmark the permalink.

22 Responses to Scientists should NEVER use CC-NC. This explains why.

  1. Richard Kidd says:

    Peter, I don't have the time right now for the extensive reply this post requires, but for now I would like to register that I believe it contains significant factual errors of our discussions, and wholly incorrect inferences about our motivations, both in the specifics of the project and the wider motivations of the society.

    • pm286 says:

      I will be very happy to post any reply in full.
      The RSC itself appears to take little part in the debate about Openness of information. It posted a factual report ( ) on the Hargreaves report (Digital Opportunity 2011, which urged that factual scientific information should be Open and that the UK would benefit). The RSC did not comment on this despite the fact that a large part of modern chemistry is now digital.

      If the RSC had any public interest in the re-use of digital information it should be encouraging discussion and helping to form a useful consensus or analysis of irreconcilable views. As it does not we can only gues at its motives and we may guess wrong. Blogs like this provide one of the few useful discussion places and I am happy to stand corrected if required.

  2. commented

    "I thought that learned societies existed to promote their discipline. To work to increase quality. To help generate communal resources for the better understanding and practice of the science. And chemistry really badly needs communal resources – it’s fifteen years behind bioscience because of its restrictive practices. But I’m wrong. Competition against other learned societies is more important than promoting the quality of science. "

    You know me for ChemSpider of course and I do now work for RSC as a result of them acquiring ChemSpider. As a learned society I can say that RSC does exist to promote our discipline. We DO work hard to increase know how hard we at ChemSpider have worked to improve quality. It's hard and we've only partially succeeded in our own narrow area of focus. But I am surrounded by colleagues who work hard on this everyday. NOT just ChemSpider but on quality improvement processes for publishing, data-sharing, accessibility, and on and on.

    In terms of generating communal resources for the better understanding and practice of the science that is what we do with ChemSpider, with ChemSpider SyntheticPages, with our presence on projects such as OpenPHACTS (Open Pharmacological Space). It is exactly what we do. I am paid by my employer, and a learned society, to deliver, deliver, deliver chemistry-based communal resources. And, I judge, we are doing rather well.

    Regarding "Competition against other learned societies is more important than promoting the quality of science." I can comment that I don't get up in the morning to compete against the efforts of any other learned society and their efforts. BUT, we do operate in the same space, we do overlap in some of our efforts, some of our vision and some of our execution. As a result we can be seen to be in competition. Competition is NOT more important than promoting the quality of science. But it does of course exist, is extremely healthy, is motivating and, in many cases, very enjoyable.

  3. Richard Kidd says:

    The RSC's public interest in \re-use of digital information\ in chemistry? Apart from promoting semantics and open standards into the publication process (not saying we’re there yet), OSCAR use, ChemSpider, ChemSpider SyntheticPages, Open PHACTS, and supporting (i.e funding) continued InChI developments (with other InChI Trust members, associates and supporters), linking to Utopia, building RDF to help OreChem? And supporting whatever JISC projects we’ve been asked to, supplying a whole bunch of data for use in TREC CHEM text mining, participation in the Pistoia SESL project, being permissive regarding CrystalEye? All that is pretty much public.

    A very useful meeting I went to a couple of weeks ago between pharma companies and publishers tried to clarify legal and data integration issues over data mining – it’s a large step forward, not a solution yet, but I believe that process will deliver more clarity over the issues of text mining of academic content. What I'm trying to get over is that practical progress is being made in many areas - by many people.

    And finally – most annoyingly - the factual bits. Boring for everyone else, but here goes...
    * several years of asking? The projects we were involved in had formally ended (SciBorg, then ChETA), then after further projects without RSC involvement we were asked to release this training data as CC0. A solution was agreed within 10 days.
    * I have AGREED for the training set of sentences for OSCAR to be released as CC-BY (emails dated 24/26-11-2010 - a year ago!), with acceptance from your group. The corpus of papers as a whole will remain NC (as agreed with co-researcher in the SciBorg project) and is not quite the same thing as the training set. We haven’t stood in the way of OSCAR development.
    * In negotiations I explictly offered that any use of our NC material could be used in training OSCAR, in a commercial environment or otherwise, would be completely acceptable. This was rejected.

    I am disappointed (to put it VERY mildly) that having bent over backwards to enable continued OSCAR development in the way you asked, this has been ignored and publicly misrepresented as the opposite of what I believe was agreed. This makes the extrapolated speculation on RSC motives somewhat nonsensical.

    I’ll write more about my position on NC elsewhere - I fear its removal will make less data available to the public, rather than more - but just to note again for the devoted reader, its use was not for the reasons Peter ascribed in the blog post.


    • pm286 says:

      Thanks Richard

    • I really look forward more on "I fear its removal will make less data available to the public, rather than more" -- this is, in part, exactly the discussion that needs to happen during the CC licenses 4.0 process.

      In part, because the question isn't only whether particular changes CC could make would be thought to lead to more or less data (and other works) available to the public, but how much social welfare is obtained by the availability of said data and freedoms (or lack thereof) to use the same.

      Also a clarification -- the option to use NC terms can't really be removed. Even licenses CC has retired remain on the web, just not promoted, and with warnings. If there was some way for CC to fully remove its NC licenses, entities that wanted to could create custom licenses, or use new general licenses from some other entity, or old licenses -- many "open content" licenses were written and some used a bit in the late 1990s and early 2000s before CC took off -- most of those were not what we'd consider fully open, containing NC and other restrictions.

      What CC can do, given its prominence in this sphere, is more clearly differentiate between licenses that when used unambiguously contribute to a commons, and those that contribute to something less, eg where some entities aren't welcome and some are more equal than others. (Not that limited permissions aren't better than nothing -- if scientists should never use NC, they should never ever use unmitigated copyright, right?) I consider a useful way to think of many of the things CC *could* do in 4.0 as mechanisms for increasing the range of and differentiation among the public licensing options offered. Branding/license naming, legal definitions, and implementation in tools such as the CC license chooser, explanatory materials, domain-specific recommendations, all can do this. The question is which knobs to turn, and how far, for optimal social welfare. :)

  4. Pingback: Brainfood: Early farmers, Ecological restoration, IPRs, Soil bacterial diversity, Perenniality, Carrot diversity, Earthworm mapping

  5. Pingback: Unilever Centre for Molecular Informatics, Cambridge - “Open Access” and “Non-Commercial” – yet again. Can any publisher justify CC-NC for PAID content? « petermr's blog

  6. Andrew Dalke says:

    'Note that “This work is published Under a Creative Commons Licence” – which so many people glibly use is almost useless.'

    Agreed! It reminds me that the "CML Schema is distributed under a Creative Commons license, allowing redistribution but NOT derivative works." I'm glad your viewpoint has changed. We last talked about this two years ago; have you decided which specific CC license to use for CML Schema? I don't see it along with the software.

    You write elsewhere, earlier this year, against gratis works, saying: "With gratis material you cannot as of right: Create a derivative work – this curtails innovation". I'm glad that you now agree with me that disallowing derivative works on a specification curtails innovation. Will you be removing that clause from the CML Schema license to allow, for example, data-mining of the text, generation of parser code from the Schema parser generators, and modified versions of CML?

    • pm286 says:

      Thanks Andrew,
      Let me first say publicly that the CML schema is Open and forkable. It has been distributed under the Artistic 2.0 licence since 2006. This licence allows derivative works but requires the derived work to record its ancestry and to be distributed under a different name.

      The primary problem is licence maintenance. Some of the exposed software is on servers where licence information has not always been copied to every file. Ultimately this is my responsibility but I am relatively stretched for resources, and there are servers where it is difficult for me to access. If you would like to volunteer to clean up the licences on the CML sites I would be delighted and will work to give you access.

      I have not used "CML Schema is distributed under a Creative Commons license, allowing redistribution but NOT derivative works.” for many years. It is possible that it still resides in old files - again if you find them I am happy to give you permission to access them. If there were a magic button that would refresh licences on all the sites I would press it.

      • Andrew Dalke says:

        I'm looking at the repository at with a last edit of that license from 5 years ago. The Wikipedia page on CML claims that that is the primary repository, and I see you made a commit 13 months ago, so I assume it's still the active repository. Where is the primary repository? As it stands, Jumbo is under the Artistic License but CML is not. Where is the documentation which says otherwise?

        In our email exchange of 2 years ago you said you would update the codebase to say "Artistic 2.0" because the current statement does not specify which Artistic Version you mean, and 1.0 is incompatible with the GPL. Your "glib" written use of the underspecified phrase "Artistic license" in 1996 is of course quite similar to how others say "This work is published Under a Creative Commons License" without specifying just which license they mean.

        If you, with your long experience in dealing with copyright issues and licenses, find it difficult to fully specify the licenses of the software you write, and difficult to maintain those licenses over time even with access to the repository, then you can well understand why so many people, who have little experience in these matters, find it very confusing!

        • pm286 says:

          I agree with you that licensing is difficult! I will blog about this.

          There are a variety of sites where CML is found. In some of these CML schema is codistributed with code, others it stands alone. I can think of at least 3 places which variously use GPL, LGPL, Apache and Artistic 2.0. Some of these licensing decisions were outside my control. As you know we had a considerable discussion on the Blue Obelisk list about the licensing of CML in InChI where I was under pressure to change the CML licence from LGPL to BSD because a software company had requested the InChI group to do this. I have not been able to get the consent of my co-author on this (I haven't tried very hard). And formally therefore I can't migrate the code - I would have to get someone to rewrite it without looking at the code.

          There is a technical problem as to where the licence resides. If it's inside each file there is a major problem of maintenance. If it's on a site (as technically it is for Sourceforge) then it covers the Schema - which I regard as declarative code.

          The current schemas have been developed by Joe Townsend and are on It appears that the licence information is not attached. It should be and I have to try to (a) get the password to the site and permission to change it (b) find resources to do it. Again, I make the standard Open Source offer that if you find a bug - and this is a bug - then you are welcome to join the project and fix it.

          I agree that I have made mistake in the past and have tried to fix them as I have time. As an example I started this blog as CC-NC and then about 1-2 years in was persuaded to change. The CC-BY licence was attached to the WordPress skin. When the WordPress software was upgraded (I am not allowed to have access) the licence got dropped off. I didn't notice this and I cannot technically do this. I have added a simple statement to the "about" on the site.

          So the licences attached to CML code and schema need considerable tidying up. These are bugs in a complex software system, not a deliberate act. I have made my intention clear and if you wish to help implement it you are very welcome.

          The situation with commercial publishing transactions is quite different. It is their primary business to understand licences. It is clear that for many of them it is a deliberate act to use NC. Moreover unless projects such as CML which depend on voluntary contributions the publishers are charging authors 3000 USD for each NC licence. Give me 3000 USD and I'll get someone to tidy up CML.

          • Andrew Dalke says:

            My point is that your own philosophical views on the acceptable form of licenses has changed over the last few years, and I don't think you are aware of the change. That is, you now say that two practices you did in CML were incorrect; a too-vague license and a derivation prevention clause which you now believe "curtails innovation." I look forward to your essay on the topic.

            You can call them bugs if you want, but they aren't. How do I send you a patch? I can't. If I change the LICENSE.txt file without explicit confirmation from you, then I strongly doubt it has legality. How do I get commit rights to sites where you don't have commit rights? The primary "bug" is that there's no unambiguous license to the CML spec, such that someone can start using it without having to contact you for explicit permission.

            The reciprocity offer "if you find a bug .. fix it" has not aged well. Few people believe it's true. Searching now, page after page of projects say things like "If you find a bug or misbehavior, please let us know, so that we can fix it." Not, "if you find a bug, send us the fix." Your view places a high barrier to entry and implies a moral obligation that I personally dislike. Pointing out a bug *is* getting involved, and I don't want the extra responsibility of commit rights.

            Are you actually saying that unless I'm willing to spend, what, a day on straightening things up for you, then it's not going to change? Do you presume that I have more free time for this than you do?

            I don't know how the CML code in InChI is at all relevant to this discussion. The CML schema that I'm talking about is different code and a difference license than the LGPL CML parser in InChI. In addition, you misremember OpenEye's request. In the email you posted to the OB list, they asked for either the BSD license *or* a waiver for the LGPL requirement "to enable the end user to be able to replace and rebuild the executable with an alternate version of the LGPL code", because they ship .a file and statically linked binaries, with no mechanism to rebuild.

          • pm286 says:

            I have changed my views regularly and have commented on them from time to time. This should not be a surprise. For example this blog started as CC-NC and I then changed and explained why.

            I shall address the tidying of the licences.

  7. Richard Kidd says:


    I acknowledge that in a subsequent post you add
    "(I hold my hand up and admit I got some details wrong); and for misinterpreting their motives. I may well have misrepresented them, but since I am not aware of major policy statements from RSC on Open Access, Licensing of Content, and their reaction to the Hargreaves report on intellectual property and copyright I have to make my own judgments."

    I acnowledge the bit in parentheses, but - in your own words, it's (mumble). Corrections should be specific, and at the very least appear in the post to which they apply, rather than six posts up the page. You often update the body of posts, but not in this case, so it remains with its major premise based on an inaccurate anecdote.

    I reiterate, in case no-one gets my point, that because we agreed to release what OSCAR needed for the training, the rest of the speculation abour RSC and our motives, which were based on the erroreous report, collapses. OA, and Hargreaves, is irrelevant to what was readily agreed a year ago to non-OA content.

  8. Pingback: Now Elsevier starts a PLoS ONE clone « Sauropod Vertebra Picture of the Week

  9. Pingback: Commercial Exploitation of Content and the Instagram Story « UK Web Focus

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>