Monthly Archives: April 2008

What is strongOA?

In previous posts (see links in Why weakOA and strongOA are so important) I have welcomed the Suber-Harnad approach to OA, labelling obejcts either as "strong OA" or "weak OA". In this post I want to explore what strong OA is. I believe this is possible and relatively simple. I hope that all OA advocates will be able to agree on an operational procedure that will simply and absolutely determine whether something is strong OA.

A useful starting point is the Wikipedia "definition". I have copied this verbatim and added two suggested clarifications:

Open access (OA) is free, immediate, permanent, full-text, online access, for any user, web-wide, to digital scientific and scholarly material,[1] primarily research articles published in peer-reviewed journals [PMR: and academic theses]. OA means that any user, anywhere, who has access to the Internet, may link, read, download, store, print-off, use, and data-mine the digital content of that article [PMR: without requiring to consult authors, publishers, or hosting sites]. An OA article usually has limited copyright and licensing restrictions.

PMR: The Budapest declaration includes the definition:

By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.

For background let's assume some axioms:

  • we must have an operational procedure for determining the strongOAness of an object. Without the procedure we have argued endlessly over things that now will not take up our energies. The only people who will argue are those who wish to muddy the OA waters, including those who wish to rent weakOA objects for the sale price of strongOA.
  • we can explore the discussion in the arena of scholarly and research publications. There will be an overlap with certain digital objects (data sets, computer code) but we'll omit discussion here. The most important artefacts are research articles (normally peer-reviewed) and theses (of all types, undergraduate, masters, doctoral) published through a University or scholarly organisation.
  • that there are overriding statements of intent which also contain definitions. These include the BBB declarations [above] and the The Open Knowledge Definition (which is part of the basis of Open Data and Science Commons). I believe these all describe strongOA (and it would be difficult to dumb them down without breaking my idea of strongOA). It is how to translate these definitions into practice that I address here.

I see the following challenges for strongOA.

  • The logical consequences of strongOA are extensive. I believe is possible for anyone to download a complete journal, repackage it and resell it without the publisher's or author's permission. They must, of course, preserve the provenance (authorship) but that's all that is formally required. Just as people re-use and resell my cmoputer code (as in Bioclipse) they can do the same with my articles and - theoretically - the whole OA content of, say, PLoS or BMC. In practice I think that would be slightly questionable and that's where community norms come in - it's useful to say "you may do this but we'd rather you didn't". I generally enforce this by adding the bit-rot-curse to my code. So there will be a culture change as publishers adopt strong OA - there will be mistakes - and we should help them adjust.
  • There may be a tendency to blur the boundary. "This article is strongOA as long as it is for non-commercial use". No. It is either strongOA which requires the permission for commercial use or it's not strongOA. We have to agree on this.
  • We have to police strongOA. One of the plus points of Open Source and the weak points of OA (up to now) has been the policing. If you say something is strongOA and it isn't someone should take you to task (gently if it's a mistake). If we don't do this then the bright shiny present that the Suber-Harnad terminology has created will tarnish. Fuzzy practice begets fuzzy thinking.
  • We have to be able to know (not just guess) the strongOA status of an object at all times. This is critical. I shall continue to stress this. It's not good enough to say "I am emailing this document from repository X which is classified as an Open Access repository, so you can do anything you like with it". The document/artefact has to announce that it's strongOA. Nothing else will do because provenance by association gets lost. The only way that I know of doing this is by embedding a licence or reference to a licence in the document. Typical licences include CC-BY, GPL document licence, Science Commons/Open Knowledge (meta)licences such as PDDL, or CC0. The licence can be asserted either by embedding RDF in the XML/HTML or adding an approved icon from the organisations above.

To summarise at this point:

strongOA requires a clear borderline defined by a licence (or licence reference) embedded in the document and policed by the scholarly community.

This discussion has been about what strongOA is, not whether it's a good thing or how to achieve it. It ought to be something that responsible publishers have a view on as well as authors, funders, repositarians, human readers and machine users.

Why weakOA and strongOA are so important

Yesterday Peter Suber and Stevan Harnad announced (Strong and weak OA) a critically important step forward in OA - that the terms "weak OA" and "strong OA" should be used to describe various approaches, philosophies, practices. I reported this (Weak and Strong OA) and promised to elaborate from my perspective.

Until yesterday the label "OA" was too fuzzy to allow precise definition of practice. This had serious practical consequences:

  • an author or funder paying for "OA" might be getting less than they expected.
  • a reader or user (human or machine) might not know what they could and could not do with an "OA" article. "OA" did not guarantee rights of re-use and it was possible that a reader could do something in good faith that would earn then a lawyer's letter from the publisher (or worse).
  • Many of us (funders, librarians, authors, readers) wasted huge amounts of time trying to make clear what could and could not be done. Generally this led to erring on the side of extreme caution (==paralysis) and was a godsend for those trying to inject FUD int the system.

I take Peter and Stevan's observations that most OA is not strong OA and there is a place for weak OA. I support that view. I shall of course campaign for strong OA but now it is entirely clear (as I intend to show) what it is that I am campaigning for.

More later, but until then we should all practice trying to catalog digital objects into three categories: nonOA, weakOA, strongOA. I think the results may be surprising.

Why we need semantic authoring tools in chemistry - 3

The type of problem highlighted in my recent post is a very serious one and so rather than giving the answer I want to help you discover it for yourself. Hopefully then you will have a wow! or aha! or buggerthat! moment that will help orient you to the importance of semantic tools. Persevere in this and you will see why I rant against PDF, why weak OA does not normally provide high quality semantic documents.

You need to know a very little chemistry and I'll explain it all below. But first the essence of the problem (relating to methyl chloromethyl ether - you can look it up on WP but it's not necessary to solve the problem)

In essence the chemical formula as given:


CH3OCH2CI

is completely incompatible with the molecular mass as given:


Molecular mass: 80.5

For those who have forgotten high-school chemistry all you need to know is:

  • Elements are defined by an unambiguous symbol. Thus "H" means hydrogen, "C" means carbon, "O" means oxygen. You can look up all the information in Wikipedia.
  • The count of each element is one, unless subscripted. Elements can be repeated. Thus CH3OH is read as one carbon, three hydrogens, one oxygen and another hydrogen. Adding them up gives one carbon, four hydrogens and one oxygen.
  • to get the molecular mass you look up the atomic masses of each element in the Wikipedia entry (or on the Blue Obelisk site) and multiply by the count. The example above (methanol) goes: 1 carbon @ 12 = 12; 4 hydrogens @ 1 = 4; 1 oxygen @ 16 = 16. Add together and the answer is 32 (you can check in Wikipedia). Note that you should round the atomic masses to the nearest 0.5 (my teaser is not a problem of decimal points).

If you do this for the puzzle compound you should discover the problem.

And you'll see why it bears on PDF, OA, and all the rest.

If we had semantic chemical tools where the information was checked as it was entered this COULDN'T happen. Now for that we need something like a chemical plugin for Word.

Is there a good fairy out there?

Weak and Strong OA

Peter Suber (and Stevan Harnad) have just published a very important announcement about the definition of various types of OA. I've known about it for some days and have been waiting till it's public. I'll copy it in full and then comment.

The term "open access" is now widely used in at least two senses.  For some, "OA" literature is digital, online, and free of charge.  It removes price barriers but not permission barriers.  For others, "OA" literature is digital, online, free of charge, and free of unnecessary copyright and licensing restrictions.  It removes both price barriers and permission barriers.  It allows reuse rights which exceed fair use.

There are two good reasons why our central term became ambiguous.  Most of our success stories deliver OA in the first sense, while the major public statements from Budapest, Bethesda, and Berlin (together, the BBB definition of OA) describe OA in the second sense.

As you know, Stevan Harnad and I have differed about which sense of the term to prefer --he favoring the first and I the second.  What you may not know is that he and I agree on nearly all questions of substance and strategy, and that these differences were mostly about the label.  While it may seem that we were at an impasse about the label, we have in fact agreed on a solution which may please everyone.  At least it pleases us.

We have agreed to use the term "weak OA" for the removal of price barriers alone and "strong OA" for the removal of both price and permission barriers.  To me, the new terms are a distinct improvement upon the previous state of ambiguity because they label one of those species weak and the other strong.  To Stevan, the new terms are an improvement because they make clear that weak OA is still a kind of OA.

On this new terminology, the BBB definition describes one kind of strong OA.  A typical funder or university mandate provides weak OA.  Many OA journals provide strong OA, but many others provide weak OA.

Stevan and I agree that weak OA is a necessary but not sufficient condition of strong OA.  We agree that weak OA is often attainable in circumstances when strong OA is not attainable.  We agree that weak OA should not be delayed until we can achieve strong OA. We agree that strong OA is a desirable goal above and beyond weak OA.  We agree that the desirability of strong OA is a reason to keep working after attaining weak OA, but not a reason to disparage the difficulties or the significance of weak OA.  We agree that the BBB definition of OA does not need to be revised.

We agree that there is more than one kind of permission barrier to remove, and therefore that there is more than one kind or degree of strong OA.

We agree that the green/gold distinction refers to venues (repositories and journals), not rights.  Green OA can be strong or weak, but is usually weak.  Gold OA can be strong or weak, but is also usually weak.

I've often wanted short, clear terms for what I'm now going to call weak and strong OA.  But I also wanted a third term.  In my blog and newsletter I often need a term which means "weak or strong OA, we don't know which yet".  For example, a press release may announce a new free online journal, digital library, or database, without making clear what kind of reuse rights it allows.  Or a new journal will launch which makes its articles freely available but says nothing at all about its access policy.  I will simply call them "OA".  I'll specify that they are strong or weak OA only after I learn enough to do so.

Stevan and I agree in regretting the current, confusing ambiguity of the term, and we agree that the weak/strong terminology turns this ambiguity to advantage by attaching labels to the two most common uses in circulation.  I find the new terms an especially promising solution because they dispel confusion without requiring us to buck the tide of usage, which would be futile, or revise the BBB definition, which would be undesirable.

Postscript.  Stevan and I were going to write up separate accounts of this agreement and blog them simultaneously.  But when he saw my draft, he decided to blog it verbatim without writing his own.  That's agreement!

PMR: This is an enormous advance. I shall write several posts on different aspects but here I will simply congratulate P and S on their agreement. From now on it becomes clear that the OA movement is united, coherent and points in a single direction. But the actual mechanism is important as well as we move from the political aspect of OA to include the strictly operational.

Similar movements - e.g. Open Source - have had their prophets and differences of orientation. But there again we see the unity as greater than the differences - differences which by now are accepted rather than than divisive.

It's also worth pointing out that OA is changing. Obviously the numbers keep increasing, the awareness increases and closed access is increasingly less defensible in many cases. CC-* is now much more prominent than a few years ago. The new OA terminology helps us understand these changes and work out where to go next.

I believe that the Open Knowledge Definition applies to, and can be used to define, strong OA. There is therefore a series of Strong Opens (Source, Access, Data, Knowledge) all of which require the removal of permission barriers.  There are minor differences but those arise from the natures of the endeavours, not to the fundamental knowledge rights.  In our own area it's reflected by the Blue Obelisk's Open Data, Open Source and Open Standards.

Lots more later. See some of you in London.

Why PDF is a Hamburger

In a recent comment Chris Rusbridge asks:
April 29th, 2008 at 4:47 pm e

I’ve been thinking about a blog post related to your hamburger rants. But the more I try to think it through, the murkier it gets. Is the problem that PDF cannot store the semantic information? I’m not sure, but I’m beginning to suspect maybe not, ie PDF can. Is the problem that the tools that build the PDFs don’t encode the semantic information? Probably. Is the semantic information available in the publisher’s file from which the PDF is built? Possibly to probably, depending on the publisher and their DTD/schema. Is the semantic information available in the author’s file? Probably not to possibly, depending on author tools (I’m not sure what chemists use to write these days; Word would presumably be dire in this respect unless there is a chemistry plug-in; LaTeX can get great results in math and CS, but I’m not sure how semantic, as opposed to display-oriented, the markup is). And even if this were to all happen, does chemistry have the agreed vocabulary, cf the Gene Ontology in bio-sciences, to make the information truly “semantic”? And…

PMR: Thank you Chris. It's a good time to revisit this. There are several aspects.

(From Wikipedia) The PDF combines three technologies:

  • A sub-set of the PostScript page description programming language, for generating the layout and graphics.
  • A font-embedding/replacement system to allow fonts to travel with the documents.
  • A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate.
  • and...

One of the major problems with PDF accessibility is that PDF documents have three distinct views, which, depending on the document's creation, can be inconsistent with each other. The three views are (i) the physical view, (ii) the tags view, and (iii) the content view. The physical view is displayed and printed (what most people consider a PDF document). The tags view is what screen readers read (useful for people with poor eyesight). The content view is displayed when the document is re-flowed to Acrobat (useful for people with mobility disability). For a PDF document to be accessible, the three views must be consistent with each other.

... so why is this a problem?

First let me dispose of the "PDF is only bad if it's authored with tools from a Moscow sweat-shop. Good PDF is fit for any purpose". PDF is concerned with positioning objects on the page for sighted humans to read. Yes, there are the two other views but they are often inconsistent or impenetrable. Because most of us are sighted the problem does not grate, but for those others it can be very difficult. Let's assume I have the text string "PDF". In a normal ASCII document (including HTML and Word) the "P" comes first, then the "D" then the "F". In PDF it's allowable (and we have found it!) to have the following instructions in the following order

  1. position the "F" at coordinate (100,200)
  2. position the "D" at coordinate (90.3, 200)
  3. position the "P" at coordinate (81.2, 200)

When drawn out on screen the F would come first, then the D then the P. The final result would read naturally, but a speech synthesizer would hear the order "F", "D", "P". I believe that the US government was sufficiently concerned about accessibility that they required Adobe to make alterations to the PDF software so that the characters would be read aloud in the right order. This is the Eric Morecambe syndrome (in response to Andre Preview telling him he is "playing all the wrong notes":

I am playing all the right notes, but not necessarily in the right order.
Eric Morecambe

This spills over into all common syntactic constructs. Run-of-the-mill PDF has no concept of a paragraph, a line end or other common constructs. This gets worse with technical documents - you cannot tell where the diagrams or tables are or even if they are diagrams and tables. HTML got it 90% right - it has concepts such as "img", "table", "p". PDF generally does not.

To retiterate PDF is a cheap and reliable way of transporting a printed page from one site to another and a cheap and inexpensive way of storing pages without paper. Beyond that it gets much less valuable very rapidly.

There's a general problem with semantic information. If I write "the casus belli is very important" the emphasis (italics) tells me that the words carry semantic information. It doesn't tell me what this information is. You have to guess. We often cannot guess or even guess wrong. This type of semantics is very fragile - if the phrase is cut-n-pasted you'll probably lose the italics in most systems. If, however, you use HTML and write 'class="latin" and 'class="authorEmphasis" you immediately see that the semantics are preserved. So HTML can, with care, carry semantics. PDF generally cannot.

To answer your other points rapidly (I will come back to them in more detail). I use to think Word was dire. Word2007 has changed that. It can be used as an XML authoring tool. Not always easily but it preserves basic semantics. And as for a chemical plugin to Word...

...I've run out of time :-)

Open Knowledge in London

On Wednesday the Open Knowledge Foundation is holding the First Open Knowledge London Meetup on Wednesday 30th April

The first Open Knowledge London meetup will take place this Wednesday at the London Knowledge Lab. The meetup should be great opportunity for informal discussion of open knowledge projects and issues. If you’d like to participate or present, please add details to the wiki page!

  • When: Wednesday 30th April, 19:00-21:00
  • Where: London Knowledge Lab, 23-29 Emerald Street, WC1N 3QS.
  • Wiki: http://okfn.org/wiki/LocalGroups/LondonGroup
  • PMR: I intend to be there (haven't yet  checked diary). One of the values of blogging is that JudithMR knows what part of the world I am or will be in. I have to make a sacrifice - a lifelong supported of Liverpool FC I shall not be able to watch the match.
    'Some people believe football is a matter of life and death.
    I'm very disappointed with that attitude.
    I can assure you it is much, much more important than that.'
    However Open Knowledge is also a matter of life and death and takes precedence. It is part of what we need to save the planet.

    Why we need semantic chemical authoring-2

    We're in the process of aggregating a repository of common chemicals (somewhere in the range 1000-10000 entries) and we are taking data from various publicly available web sites. Typical sources are Wikipedia, any aggregator with Open Data policies and MSDS sheets (chemical safety information). One such site is INCHEM (Chemical Safety Information from Intergovernmental Organizations which lists about 1500 materials (most are chemical compounds though some are mixtures).

    The information on the web is HTML pages and we wrote a scraper to extract the information from each. I'd planned to show a screenshot but WordPress has stopped me uploading any images, so you'll have to visit the link. In any case you wouldn't be able to see the point from a screenshot. Scrpaing is not fun - the HTML is as bad as almost any other HTML. It needed a 2-pass process - first into HTMLTidy and then analysis by XML tools. From his we extract the most important information and turn it into CML - names, formula, connection tables, properties, etc.

    We wanted to see if the aggregation and consistency checking could be done by machine, using RDF. This is surprisingly hard as none of the sites contains all the information we need and many have large sparse patches. There is also the subtle problem of identifying the platonic nature of each chemical - what should we actually use as an entry for - say - alumin(i)um chloride? Or should there be more than one?

    We've got the data in. There are a large number of simple but niggly lexical problems, such as the degrees symbol for temperature (totally inconsistent within and between documents) And the semantics - how do you record a boiling point as "between 120 and 130 at 20 mm Hg"? (CML can do this, but it takes work to do the conversion.)

    And the sites have errors. Here's a rather subtle one which the average human would miss (we needed a machine to find it). You'll have to go to the page for chloromethylmethylether - I daren't try to transcribe it into WordPress. The error is in the displayed page (no need to scroll down).

    It we had semantic authoring tools this wouldn't happen. I'll be blogging soon (I hope) about our activity in this area.

    UPDATE: My best go at scraping the bit of the page with the error. It's now semi-semantic (HTML) so you should be able to track the error down. You only have to know a little bit of chemistry...


    Dimethylchloro ether


    Chloromethoxymethane


    CAS #

    107-30-2

    CH3OCH2CI

    RTECS #

    KN6650000

    Molecular mass: 80.5

    TANSTAAFL: Openness is not a Free Lunch

    In a reply to a recent post Rich Apodaca made the point that Open Access (Open Data) will require business models:

    Rich Apodaca Says:
    April 28th, 2008 at 1:38 am e

    By identifying and executing the right business model, the idea of control will become much less important. For example, you’ll find few complaints about Google essentially controlling the online search market; the vast majority of users are delighted to be able to search with the service whenever they want - and to have Google index their site.

    This only happened because Google found the right business model and executed on it.

    Maybe open access pricing and business models bring out nonproductive arguments because those putting them forward (and responding) are stuck in old patterns of thinking, or too heavily dependent on the current system. Scholars and publishers likely both share responsibility here.

    My guess is that the open access scientific publication system that ends up working will start out by horrifying most of today’s scholars and being ridiculed or ignored by today’s publishers. But there will be a few niche groups for whom the truly disruptive open access innovation in scientific publishing will be a godsend.

    Developing a workable open access business model starts by identifying who these groups are and how solving their problem can solve other important problems. It continues with finding a price and medium of exchange (perhaps not even money) that the market will find tolerable for awhile.

    How can this issue be anything other than central to making open access work?

    PMR: I agree generally with this - it's often characterised by TANSTAAFL ("There Ain't No Such Thing As A Free Lunch,"). I think most of the major innovators (certainly the funders)  realise this - that's why they are prepared to develop funder-pays approaches for Open Access.

    Data is/are a particular problem. Data are more expensive than manuscripts. It's virtually cost-free to download and read and copy and transmit a standard PDF or HTML, or any other document whose sole endpoint is to be read by humans. The creation of a reading human is, of course not, cost-free - the investment in the average human is large - but it's not generally borne by higher education or scientific research (YMMV). But data are complex, and we are only at the start of learning what we can do with them. Open Data is not an end, but without it there is no beginning.

    Data are normally produced for a particular purpose and the reuse them for another cost money. I'll exemplify this by taking CrystalEye data - about 120,000 crystal structures and 1 million molecular fragments - which were aggregated, transformed and validated by Nick Day as part of his thesis. (BTW Nick is writing up - it's a tribute to his work that CrystalEye runs without attention for months on end). The primary purpose of CrystalEye was to allow Nick to test the validity of QM calculations in high-throughput mode. It turned out that the collection might be useful so we have posted it as Open Data. To add to its value we have made it browsable by journal and article, searcahable by cell dimensions, searchable by chemical substructure and searchable by bond-length. This is a fair range of what the casual visitor might wish to have available. Andrew Walkingshaw has transformed it into RDF and built a SPARQL endpoint with the help of Talis. It has a Jmol applet and 2D diagrams, and links back to the papers. So there is a lot of functionality associated with it.

    This has come under some criticism to the effect that we haven't really made it Openly available. For example Antony Williams(Chemspider blog) writes (Acting as a Community Member to Help Open Access Authors and Publishers):

    "This [interaction with MDPI] is contrary to some of my experiences with some other advocates of Open Data and Open Access where trying to get their “Open Data” is like pulling teeth."

    PMR: I assume this relates to CrystalEye - I don't know of any other case. Antony and I have had several discussions about CrystalEye - basically he would like to import it into his database (which is completely acceptable) but it's not in the format he wants (multi-entry files in MDL's SDF format, whereas CrystalEye is in CML and RDF).

    This type of problem arises everywhere in the data world. For example the problem of converting between map coordinates (especially in 3D) can be enormous. As Rich says, it costs money. There is generally no escape from the cost, but certain approaches such as using standards such as XML and RDF can dramatically lower the costs. Nevertheless there is a cost. Jim Downing made this investment by creating an Atom feed mechanism so that CrystalEeye couls be systematically downloaded but I don't think Chemspider has used this.

    The real point is that Chemspider wishes to use the data for a different purpose from which it was intended. That's fine. But as Rich says it costs money. It's unrealistic to expect we should carry out the conversion for a commercial company for free. We'd be happy to a mutually acceptable business proposition and it could probably be done by hiring a summer student.

    I continue to stress that CrystalEye is completely Open. If you want it enough and can make the investment then all the mechanism are available. There's a downloader and converters and they are all Open (though it may cost money to integrate them).

    FWIW we are continuing to explore the ways in which CrystalEye is made available. We're being funded by Microsoft as part of the OREChem project and the result of this could represent some of the way in which the Web technology is influencing scientific disciplines. We'd recommend that those interested in mashups and re-use in chemistry took a close look at RDF/SPARQL/CML/ORE as those are going to be standard in other fields.

    TANSTAAFL...

    Molbank adopts CC-BY licence

    I am delighted to report than Molbank is moving towards a CC-BY licence for its content... In a comment on Antony Williams' blog (Acting as a Community Member to Help Open Access Authors and Publishers)

    1. Dietrich Rordorf says:
      April 27th, 2008 at 11:41 am We are aware that our current MDPI copyright statement is not in line with the BBB definitions on open access. We are currently smoothly moving to a CC By Attribution License v3.0. Marine Drugs (http://www.mdpi.org/marinedrugs/) has already been published under that license since January 2008. IJMS (http://www.mdpi.org/ijms/) and other MDPI journals will start publishing under this license in the May respectively June 2008 issues. All previous content published by MDPI will be released under the CC By license within a couple of months on our new publication platform (now under testing). So this discussion about MDPI and open access will soon be part of history.

    PMR: This is great news. (I had missed the news for Marine Drugs). Much kudos to Molbank. (IIRC it was actually founded with help from a grant from the Soros foundation).

    This means that we can have exciting new derivative works from the content - as Chemspider is showing. Since OSCAR is able to extract some of the content (e.g. peakLists) it will be useful to see what precision/recal we get.

    The Control Fallacy: Freedom isn't about prices, but about rights

    From John Wilbanks, The Control Fallacy:  Why OA Out-Innovates the Alternative, Nature Precedings, a preprint deposited April 25, 2008.

    Abstract:   This article examines the relationship between Open Access to the scholarly literature and innovation. It traces the ideas of “end to end” network principles in the Internet and the World Wide Web and applies them to the scholarly biomedical literature. And the article argues for the importance of relieving not just price barriers but permission barriers.

    PMR: John (Science Commons) covers most of the main ground arguing the benefits of Open Access and Open Data. Science Commons goes beyond Open Access to cover data and other forms of Knowledge:

    Lost in too much of the debate over Open Access (OA) is the relationship between
    access, control, and innovation.

    Too often, the OA discussion is one of radical polarization. Much of this comes, in my
    opinion, from the focus of the debate on economics and business models. While the
    money side of this is clearly vital - peer review needs to be paid for, after all - it's also the
    issue that often leads to the least constructive debate.

    PMR: Yes. Control has often been neglected in comparison to access. I can remember a meeting ca 5 years ago at Cambridge on Open Access - almost all the arguments were about journal prices. I think I surprised a number of attendees by arguing that it was about control of our information.

    John also promotes an analogy with renting...

    Now, those publishers are happy to rent access to the knowledge heritage. Rent is the key
    word here, though. When the scientific publishing industry went online, they stopped
    selling journals to people and started renting them. If you've ever rented an apartment,
    you know that rentals come with a lot fewer rights than ownership. In this context, the
    users lost a slew of rights ­ remember, you can legally resell a physical copy of a book or
    CD, but you can't legally forward a PDF from the newest issue of Science. You don't
    have the right to share things like journal articles when you rent them.

    PMR: The key point. It is surprising how few authors realise what they give away.

    Many of the controls that a publisher can impose are built on top of that copyright. So
    even if you have rented access to the full text of articles, the license agreements you've
    signed with the owners frequently make it illegal to use software to index and mine the
    literature. Elsevier's copyright rental agreements are a good example ­ make sure to read
    through to page 5.
    23


    This control culture is not the result of bad people making evil decisions. It's simply an
    antique system. It made sense when it started, and it actually made sense until the Internet
    came along and changed everything. But the control culture is a powerful drag on
    innovation when you're in a networked reality.

    PMR: We need to make this simple point repeatedly. Control leads to loss of innovation.

    Complexity challenges coherence. Complexity overwhelms consistency. Quality control
    can only scale as the people scale, and in closed systems, all of those people must
    somehow be paid by the same paymaster. Closed systems and cultures of control simply
    don't work as well as open systems in complex, rapidly shifting environments.

    [...]

    That's why access is so vital. That's why it's vital to support the publishers that go OA,
    and the traditional publishers who are taking bold steps to foster innovation and
    knowledge creation. That's why it's important to focus on access and rights and not price
    ­ because giving knowledge away for free but without the rights to make it useful doesn't
    make the grade. Freedom here isn't about prices, but about rights. (PMR emphasis)