petermr's blog

CRIG winners

Posted on April 16, 2008 by pm286

I’m delighted to congratulate the winners of the CRIG (Common Repositories Interoperability Group) competition at OR08. This was an innovative piece of funding – instead of giving a small grant to a group to do a small piece of work JISC announced a prize for the best on-the-spot development in this subject to be presented at OR08. Teams of developers would spend 1-2 evenings at OR08 creating prototypes instead of spending the time in the bar. (Or combining these activities). It is surprising and exciting how much can be done in a day or so. Modern tools help, and of course the Open architecture means that people can borrow ideas and technology from elsewhere.
There were about 20 teams and Jim and I entered. All teams got T-shirts. Unfortunately I was grounded at Ansterdam for a day and missed the spot – so we withdrew. Here’s the winners… ECS developers win $5000 repository challenge

The challenge winners

Developers from ECS, Southampton, and Oxford University won a $5000 challenge competition which took place at the OR08 Open Repositories international conference.
Dave Tarrant, Tim Brody (Southampton) and Ben O’Steen (Oxford), beat a large field of contenders, including finalists from the USA and Australia, by demonstrating that digital data can be moved easily between storage sites running different software while remaining accessible to users (watch video). This approach has important implications for data management and preservation on the Web.
Repository sites have become a global phenomenon in higher education and research as a growing number of institutions collect digital information and make it accessible on the Web. There are now over 1000 repositories worldwide.
However, with the growth of institutional repositories alongside subject-based repositories, and in cases where multiple-authors of a paper belong to different institutions, it is important to be able to share and copy content between repositories.
Meanwhile the repository space has become characterised by many types of repository software – DSpace, EPrints and Fedora are the most widely used open source repository software – containing many different types of content, including texts, multimedia and interactive teaching materials. So although sharing content and making it widely available (interoperability) has always been a driver for repository development, actually moving content on a large scale between repositories and providing access from all sources is not easy.
The OR08 challenge, set by the Common Repository Interfaces Group (CRIG), had just one rule for the competition: the prototype created had to utilise two different ‘repository’ platforms.
The winning demonstrator showed data being copied simply from an EPrints repository to a Fedora repository, and then moved back in the other direction. What was striking is that among repository softwares, EPrints and Fedora are seen as being quite different in the way they handle data, so the approach used is likely to be just as useful with other repository softwarel.
This data transfer was achieved using an emerging framework known as Object Reuse and Exchange (ORE), a topic that attracted one of the highest attendances at OR08. ORE is yet to appear in beta form, but specifications are being developed that allow distributed repositories to exchange information about their digital contents.
According to Dave Tarrant, ‘Interoperability is the innovation. We think it is a bad idea to reinvent the wheel so with the availability and support for ORE growing, this provides a very suitable technology to provide interoperability between repositories.’
The winning team are past and present members of the JISC Preserv 2 project that is investigating the provision of preservation services for institutional repositories, and will take this work forward in the project.

PMR: It looks like magic. I will have to find out details from Jim. If it really is magic then we can expect to see a quantum leap in the power of distributed semantic information.
(I don’t think we would have won even if we’d been present. In fact we certainly wouldn’t. But I’v ekept the T-shirt and there is OR09 when we expect to be unstoppable).

Posted in Uncategorized | 2 Comments

Another call for Open Data

Posted on April 16, 2008 by pm286

It is encouraging to see the increasing use of “Open Data” and the suggestion that we should use CKAN (OKF):

Fiona Bradley, A database of data, Semantic Library, April 16, 2008. (via Peter Suber)

One of the sites recommended by Read/Write Web is CKAN, which is backed by the Open Knowledge Foundation that counts someone who has worked in the library sector amongst their leadership. Are these the types of groups more of us should be involved in to have a role in information access on a larger scale?

Last week there was a flurry of comments around a post by Bret Taylor, We need a Wikipedia for data. Taylor describes a model for a wiki that would aggregate common data in one database that could be cross-searched. Great idea.
One interesting thing about the types of datasets he mentions are that they are all copyrighted – stations own TV schedules, exchanges own market data (the free stuff is usually 20 minutes delayed) and a variety of companies own publishing rights over telephone numbers. This is the data that could be really useful if it was truly free, but given the amount of updating required, I wonder who would do so without a business or legislative imperative.
But that issue is perhaps besides the point. There are many, many incredible datasets out there, everything from Census data to older market information to astronomy. Reading the comments and suggestions on Taylor’s post and Read/Write Web’s post about the topic revealed dozens of sites to find these resources.
I did feel that looking through the list libraries may have missed an opportunity. We have been recommending and linking to various datasets on our websites for years, but there is a huge potential to go beyond this and build something collaboratively and use it as an input for different libraries. Many libraries now take in Open Access Journal records to their catalogues and search engines via DOAJ but there is no reason to not do something similar for Open Data.
Certainly, it is an issue that few of these datasets can talk to eachother – but perhaps the move towards a more standards-based Semantic Web will encourage standardisation and interoperability, at least within, for example, individual government departments so that Census records can be analysed against education records.

PMR: Open Data is very much about the power of networks. The value of a piece of information is proportional to the number of other sources it can be linked to. Fiona is right that raw data may not be easily linkable but that problem is far less if we convert it into RDF. RDF removes the syntactic problem immediately (we don’t need to worry whether it;s comma-separated values, etc.). And many tools are expecting that the vocabulary will be fluid – for example Wikipedia uses at least “birthdate” and “dateofbirth” in its infoboxes. Even simple lexical tools can help bring this together.

Of course if you have data where there is a known, Open, format (such as FOAF, protein sequences, etc.) use it. But it’s better to carry out very lightweight markup with RDF than not to deposit the data at all. And don’t underestimate the cleverness of the search engines.

Posted in Uncategorized | Leave a comment

Open Access Week – kudos to the Wellcome Trust

Posted on April 12, 2008 by pm286

The Wellcome Trust has led the revolution towards making research articles both free (priceFree) and Open/libre (permissionFree). Here is Robert Kiley on Stevan Harnad’s blog (link):

At the Wellcome Trust we also believe that “fair use is not enough” if the benefits of text and data-mining – with its promise of discovering new knowledge – are to be fully realised. Consequently, as a condition of paying an open access fee, the Trust requires publishers to licence these articles such that they may be freely copied, distributed, displayed, performed and modified into derivative works by any user. Publishers may impose conditions on users in relation to attribution (i.e. users must attribute the work in the manner specified by the author or licensor) and commercial use (i.e. specify that the work must not be used for commercial purposes. All publishers which offer a “Wellcome compliant” OA option – which includes, Elsevier, Wiley-Blackwell, Springer, OUP etc – now include this licence information in the XML they deposit in PMC. Some publishers (e.g. Springer, OUP) use the CC-BY-NC, and others (e.g. Elsevier, T&F, Society for Endocrinology) have defined their own licences, but again they explicitly allow text-mining and the creation of derivative works. These articles are also made available through PMC’s OAI interface, and as such can be downloaded and exposed to text and data-mining services. Conscious that this licence only extends to “gold” OA articles, the Trust is continuing to work with publishers to explore the possibility of developing a similar licence for author manuscripts.
Regards Robert Kiley Head of e-Strategy Wellcome Library 183, Euston Road, London. NW1 2BE Tel: 020 7611 8338; Fax: 020 7611 8703; mailto:r.kiley@wellcome.ac.uk Library Web site: http://library.wellcome.ac.uk
PMR: Again this is admirably clear. I would urge all mandaters to specifically use "text- and data-mining" in their language.

PMR. It’s a good opportunity to congratulate the Wellcome Trust on having made the position clear – no fudges. This is extremely useful when other funders are deciding what to do. Simple answer, look at what Wellcome have done.
[Addendum. I am now clearer why I was (and still am) confused about the Gold and Green lables: In critiquing(On Paying Publishers Extra For Extra Usage Rights) RobertK, Stevan Harnad writes:

The Wellcome Trust, the world’s first research funder to mandate OA, has not only mandated Green OA self-archiving, but has also made funds available to authors to pay their publishers to make their articles Gold OA, in order to make them not just price-barrier-free (Green OA) but also permissions-barrier-free.

PMR: I read this as meaning that Stevan makes a close association between “Green OA” and “price-barrier-free” and similarly between “permissions-barrier-free” and “Gold OA”. Peter Suber and Chris Rusbridge (on this blog – A better interpretation of “green” and “gold”) take the view that Green and Gold do not relate to barriers but to the mechanism of deposit of articles. There is clearly fuzziness about the terminology regardless of whether it is politically better to aim for particular strategies and that we should to remove it.
I believe that the terms self-archiving, price-free and permission-free are clear. I believe “Green” and “Gold” are confusing (not only in their usage but because there is a different but related scheme for classifying OA journals). There is no logical connection between Green OA and the precise nature of barriers, nor is there a logical connection between self-archiving and barriers.
Creative commons did us the service of providing orthogonal axes for licences (e.g. NC, SA, ND). I think we need these here. “self-archived, priceFree, permissionForbidden” is clear. “Green OA” is not.]

Posted in Uncategorized | 1 Comment

Open Access Week – thank you MRC

Posted on April 12, 2008 by pm286

Peter Suber alerts us to the MRC’s new mandate on the publication of their funded research Revision to OA mandate at MRC (read it).
The key paragraph is simple:

[MRC] If an open access fee has been paid MRC requires authors and publishers to licence research papers such that they may be freely copied and re-used for purposes such as text and data mining, provided that such uses are fully attributed. This is also encouraged where no fee had been paid.

PS Comment. I praised the agreement at the time and I stand by my assessment: “When a funder pays a publisher to make an article OA, the publisher should remove permission barriers as well as price barriers. But too often publishers have only removed price barriers. This agreement to remove a key set of permission barriers is an important step forward that will help users get their work done (both human and machine users), help funders get full value for their investment, and help all players live up to the full BBB definition of OA.” Kudos to the MRC for finally reflecting the terms of the agreement on its own web site.

PMR: This is so helpful. First the language can be understood by ordinary scientists. You don’t need to know about Green and Gold. There are no fudges. You can use the material what whatever purpose you like. The MRC took one sentence to state it. “text and data mining”. Explicit.
Besides immediately releasing (modulo some embargo) their own research for text- and data-mining they also set the minimum bar for others.

Howard Hughes – do you require the removal of permission barriers, so allowing text- and data-mining? last time I looked this was a fuzzy hybrid mess, but I think the answer was “well um err probably not”
PMC – do you require the removal of permission barriers? Answer clearly no at present.
UKPMC – almost certainly same as PMC
Wellcome. I’m not sure. I thought they had removed permission barriers but now I’m worried.
CancerResearch UK?

So it’s simple, absolute and attainable. Klaus Graf put it in 31 ANSI characters:

MAKE ALL RESEARCH RESULTS CC-BY

Posted in Uncategorized | 3 Comments

UKSG – final thoughts – the future is in your hands

Posted on April 11, 2008 by pm286

I didn’t have time to blog during UKSG so here are a few random bits.
I had the general feeling of a community knowing that inevitable unspecified change was going to happen but no sense that they could do anything about it. It will happen, and the more that you act now, the more you will be in control. I feel, for example, that the societies and universities (possibly through their presses) had the change to change things in the mid-90’s but there were no experiments so we’ll never know.
I spoke about Open Scientific Data and several people thanked me later – they hadn’t realised the importance of the problem (there were few scientists there). Several said it had help them realise that they should now fully commit the Open Access – I did not draw a strong distinction between access and data. I got one formal question – sorry I can’t remember the name – but I think it was from an officer of the IOP. “If we move to Open Access, won’t societies lose their lifeblood?” My reply was to highlight the new business models introduced by Wellcome Trust and SCOAP3 (the High Energy Physics community publishing consortium). Essentially these are “funder-pays” models and I promoted these as examples of how a community could take control of its publishing and also provide a useful business model for publishers. My questioner seemed genuinely encouraged by the response.
For me the important point about funder-pays is that it is the community in control and not the publishers. There is still competition in the system – the funders and the publishers will groups and regroup according to business interests and quality of service. The current “library-pays” model gives all the power to the publishers – they know very well how to divide and conquer individual libraries with “special bundles” of journals – this has to and will come to an end. Now the funders are in control and they can negotiate a fair price with the publishers. There will be a great deal of squealing, but that’s part of the fun of capitalism.
I believe that this actually gives more power to the societies, if they are brave. Funders need responsible societies who ultimately in many cases are the judge of what is good science. Certainly they set the canons of acceptable practice in many cases. I think of the International Union of Crystallography which for many years (decades) has worked on the details of what makes a responsible crystallographic experiement and dataset. I know that it has worked with organisation like JISC on Open Access and it’s Acta Crystallographic E is now a funder-pays (or institution-pays) Open Access CC-BY journal. Cost per artcile 150 USD. Note that this is without the financial clout of Wellcome or CERN. So it can be done. There is no single model – some disciplines require more complex reviewing than others, including crackpotism. Some are data-rich. Some have to offset commercial interests such as the pharma industry.
But the messge is clear. You can change your community to Open Access publishing. There are enough examples to give inspiration. Read Peter Suber’s blog. It’s got pointers on how to prepare for Open Access.
And if you don’t prepare to change your life, others will do it for you. Without asking.

Posted in Uncategorized | Leave a comment

Clarification on ACS/NIH policy

Posted on April 10, 2008 by pm286

Rich Apodaca posted the definitive version of the ACS position on the NIH mandate:

ACS and the NIH Public Access Policy: Clarification at Last (permalink)

An alert Depth-First reader pointed me to the new ACS policy for authors receiving NIH funding. The details are contained in a document outlining two ways authors can choose to comply with the new law requiring recipients of NIH funds to deposit a copy of their peer-reviewed manuscripts into PubMed Central. The choices are:

Publish the article under ACS Author Choice by paying a fee. The ACS will then automatically deposit the article on behalf of the author. [1]

Publish the article using the standard procedure, but with the ACS granting authors the right (and responsibility) to deposit their manuscripts in compliance with the NIH Public Access Policy.

Under Option 2, copyright remains with the ACS – authors are simply granted an exception to enable them to comply with federal law. This means, among other things, that ACS retains the right to prevent third parties (including authors themselves) from creating derivative works of deposited manuscripts, and from redistributing them.

PMR: Thanks.
[1] ACS Author Choice states:

The ACS AuthorChoice option establishes a fee-based mechanism for individual authors or their research funding agencies to sponsor the open availability of their articles on the Web at the time of online publication. Under this policy, the ACS as copyright holder will enable unrestricted Web access to a contributing author’s publication from the Society’s website, in exchange for a fixed payment from the sponsoring author. ACS AuthorChoice also enables such authors to post electronic copies of published articles on their own personal websites and institutional repositories for non-commercial scholarly purposes.

PMR: My understanding is that the Open choice payment (which I believe for non-members is of the order of 2000-3000 USD) gives relatively few rights over option (2):

it allows authors to have a copy of the manuscript on their own website and IR (but not to allow re-use)
it gives the article full visibility on the ACS website (i.e. in context of the TOC) and – as I read it – to be posted in PMC by the ACS

Whether this is worth thousands of dollars authors can now decide.
It would be interesting to see (though probably never revealed) whether exposure on the ACS site or on PMC (which will become the primary site for research bioscientists to visit) gets most traffic.
Remember that Entrez and Pubmed have some great indexing tools and these will continue to increase (I hope to post more on my personal involvement here later). Whereas ACS has the power of Chemical Abstracts to search its material. CAS has many years’ of experience but has a large subscription (many many thousands of dollars per year). Pubmed and Pubchem is free.
It will be interesting to see the comparative use of these sites over the next few years. A reasonably fair test of Open Access.
Unless of course the publishers forbid the indexing of chemistry in Pubmed Central.

Posted in Uncategorized | Leave a comment

A Wikipedia for data…

Posted on April 10, 2008 by pm286

From Rufus Pollock: Open Data Going Mainstream?

RP: Bret Taylor’s recent post entitled “We Need a Wikipedia for Data” has been garnering a lot of attention around the blogosphere. [PMR – there are zillions of useful comments] While his suggestions are not particularly novel, the post and the attention it has garnered, is, I think, indicative of the growing interests in the issues of (open) data and its importance for the development of related services and products.
While generally in agreement with Bret’s arguments, there are a few differences that are worth raising. First Bret appears to favour some kind of centralized repository that everyone can read from and write to:

To this end, I think we should create a Wikipedia for data: a global database for all of these important data sources to which we all contribute and that anyone can use.

As readers of this blog [OKFN] will know, we’re sceptical of this ‘one ring to rule them all’ approach. In this regard, it is also important to distinguish finding material, parsing it, and plugging it together, issues that got rather run together in the surrounding discussion. As I wrote in a comment to Bret’s post:

There seem to be several distinct issues you (and your commenters) are concerned with:
1. Discoverability of datasets. For this you want a registry of some kind and this is exactly what the Comprehensive Knowledge Archive Network (CKAN) is designed to do. …
2. ‘Developing’ data particularly using many contributors and a versioning (wiki-like) model. This seems a general problem and one which I wrote about in this post on the collaborative development of data back in February last year. Since then various projects have launched or developed which attempt to address this issue, even if only partially (e.g. Freebase, Swivel, Numbrary, http://www.openeconomics.net …). This then leads into:
3. Componentizing data so that one can easily plug different datasets together rather than having to aggregate data together in one big place (crudely: ‘One Ring to Rule them All’ vs. ‘Small Pieces, Loosely Joined’). After all it seems unlikely that any one organization, however large, can hold ‘all the data’, and in ay case doing so would negate the benefits of having ‘many minds’ working on a problem. It is our hope that CKAN would start to facilitate the kind of packaging that one frequently observes in software but is, as yet, fairly rare for knowledge (data/content/…). More on this can be found in this blog post on componentization plus the slides from our presentation at XTech.

RP: To conclude, I [RP] definitely agree about the importance of having more open data and making it easier to find and use though I’m hoping that it will take a more decentralized and componentized form than simply a ‘wikipedia’ for data. More important though than any details is the fact that this kind of interest from a wider audience indicates that issues of data openness and production are going mainstream — something we as a community should strongly welcome.
PMR: It’s great to see the impetus behind. I suspect that some fields (and they may surprise us) will generate huge momentum rapidly. That will help others where progress is slower.

Posted in Uncategorized | Leave a comment

A better interpretation of "green" and "gold"

Posted on April 10, 2008 by pm286

In my last post I had the presumption to lecture my readership on what “green” and “gold” access mean. Hubris strikes – I got it wrong. I comment on the comments and then continue with why I think “green” is not enough:

Chris Rusbridge Says:
April 10th, 2008 at 10:45 am eI don’t think [PMR’s definition] is right at all. Wikipedia says:
“In OA self-archiving (also known as the “green” road to OA [6] [7]), authors publish in a subscription journal, but in addition make their articles freely accessible online, usually by depositing them in either an institutional repository[8] (such as the Okayama University Digital Information Repository[9]) or in a central repository[10] (such as PubMed Central)…
“…In OA publishing (also known as the “gold” road to OA [14]) authors publish in open access journals that make their articles freely accessible online immediately upon publication. Examples of OA publishers[15] are BioMed Central and the Public Library of Science.”

PMR: I agree with this and will use it in the future

CR: In both cases (green AND gold) the permissions set the terms of what you can do. OA journals do not necessarily have licences that allow data mining.

PMR: Also agreed. In many cases OA journals have no explicit permissions at all. In these cases and where I have athe time I engage with the editors to help them clarify the position. Sometimes they realise that they do actually wish to announce permissionFree re-use.

I’m also not certain that a widely distributed set of repositories (the green road) is particularly resistant to data mining. OAI access should tell you which repositories have data of interest, and you robots can go there.

PMR But they will not know whether they are allowed to mine the data. OAI does not mean Open Access. It means Open Archives Initiative and the Open says nothing about permissions. It is extremely rare (in my experience) that material in OAI repositories carries an explicit statement about re-use. It’s possible to extract Green material from an OAI repository, re-use it, and be sued by a publisher.

Perhaps the real problem is that (a) licences offered are not those you need for the task (whether green or gold), and (b) those licences are rarely expressed in machine-readable form, even though Creative Commons have encodings to allow this. If licences were so expressed, then you could let your robots wander at will, and mine what they are allowed to!

PMR: I agree with this sentiment but in practice it is unlikely that there will be universal machine-readable licences in OAI repositories any time soon. So in practice roaming the OAI repositories is no use if I wish to re-use and redistribute the material.

Klaus Graf Says:
April 10th, 2008 at 1:49 pm eI found it was not a good idea by Harnad to choose the same colors as in the “road” metaphor. The last comment shows it is indeed confusing.
* Green road: Self-Archiving in Repositories
* Golden road: OA Journals
* Green OA: cost-free Access (PMR in an earlyer post: FREE access)
* Gold OA: Access without Permission Barriers (preferably CC-BY) – (PMR: OPEN access)
These are independent aspects. Most golden road journals (in DOAJ) are access-green, and CC-BY contents in green road IRs are access-golden.

PMR: Klaus seems to use the terms Green OA and Gold OA in the way I did and also seems to differentiate between Colour-road (how something got there) from Colour-OA (what you can do with it). This seems to conflict with ChrisR and PeterS.

Peter Suber Says:
April 10th, 2008 at 3:46 pm eHi Peter: Chris is right. There are two distinctions here and we shouldn’t mix them up. One distinction is between green and gold OA, or between OA through repositories and OA through journals. The other is between removing price barriers alone and removing both price and permission barriers. I think you meant to say that removing price barriers is not enough –and I agree with that 100%. But green OA *can* be enough.
Some green OA removes both price and permission barriers, and some gold OA does as well. But also note the converse. Just as some (perhaps most) green OA doesn’t remove permission barriers, some (perhaps most) gold OA doesn’t either. When we work for the removal of permission barriers, we are working to improve both green and gold OA.

PMR: I accept this definition as coming from the fountain of Open truth. Now for the implications (and see If I have learnt OA-101):

“some green OA removes both price and permission barriers”. This means that authors publish in a subscription journal (i.e. you can only read it if you pay) BUT allows an author to self archive the article and release it under a license where anyone can read it for free and anyone can redistribute it without permission. I think it happens when authors shout loud enough or for special issues and it also happens in disciplines like computer science where everyone republishes their articles with or without permission. But in general it isn’t common and it is of very little practical use (if only because of the difficulty of discovery). It’s of no use for data-mining unless (highly unlikely) the author actually attaches CC-BY or similar.
“some gold OA does as well”. In my experience – which is limited as I am a chemist and there are essentially no examples – all major Gold OA removes permission barriers. I’m thinking of BMC and PLoS and OUP. They all have CC-BY. There are some journals who have CC-NC and I have argued the case with some but in general this is a minor concern. So which major Gold OA journals forbid re-use? (We should exclude the awful hybrid journals which take money off authors for less than permissionFree). If an author has paid money for OA, which journals forbid their readers to re-use the article?
“…perhaps most) green OA doesn’t remove permission barriers”. I agree with this.
“…most) gold OA doesn’t either”. I’m disappointed if this is the case.

My conclusion is that the terms Green and Gold seem to me to be highly confusing and operationally almost useless for a reader. The reader doesn’t care how the material got there – they need to know what they can do with it. For that there has to be a simple set of labels and CC-* provides that.
Finally a word about why it is essential that the NIH continues to mandate deposition in PubmedCentral. (Stevan Harnad has argued that it would be better for authors to self-archive in their institutional repositories). Note that many authors – e.g. from industry – don’t have IRs anyway. But the main point is that it is completely impossible to discover and systematically mine this information. Let’s assume there are ca 60,000 articles deposited in PMC this year, and that there are ca. 10,000 institutions involved. (Evne if it’s only 1000 my argument holds). If I want these I have to set my own list of 10,000 repositories and trawl the lot – every day – for new content. (And I want it daily). And every other text-miner has to do the same. How do I know when a new institution publishes? I have to go to Pubmed anyway, so I might as well read the material there. And the compliance will be awful. The NIH cannot check 10,000 sites on a regular basis. In contrast if the stuff in in PMC (or UKPMC) then I can get a single RSS feed daily which will alert me to the material that comes in. The robots have no trouble trawling this. PMC will presumably alert me to what is minable and what – thanks to the publishers – is not. So I am afraid that self-archiving is a complete non-starter.

Posted in Uncategorized | 4 Comments

Open Access Week – Green is not enough

Posted on April 10, 2008 by pm286

As today is part of “Open Access Week” (April 7 was when the NIH mandate took effect), I’m trying to write a post a day on the topic…
UPDATE — my use of Green and Gold is not correct — I comment in a later post
For newcomers, there are loosely two forms of Open Access – Green (which allows humans to read an article without charge – priceFree) and Gold (which allows anyone to do more or less whatever they like (datamine, mashup, republish, annotate, etc.) as long as they acknowledge the original author in any derivative works.
The heroic and immensely important BBB declarations (Berlin, Budapest, Bethesda) all unequivocally declared that the phrase “Open Access” meant Gold access. One of the heroes was Stevan Harnad and last week at Southampton I paid tribute to his tireless campaigning..
Recently, however, Stevan has said that he regrets having included the Gold-like clauses in BBB and wants to see the declarations revised to emphasize Green. Many others, including Peter Suber and myself, do not agree. I’ll expand my position later as to why Green Open Access is of very limited valueto scientists.
Here Klaus Graf shows why he has the same position. No apologies for giving it in full.

There is no need to update the BBB definition!

http://openaccess.eprints.org/index.php?/archives/386-Dont-Risk-Getting-Less-By-Needlessly-Demanding-More.html
Peter Suber has answered at
http://www.earlham.edu/~peters/fos/2008/04/price-and-permission-barriers-again.html
Peter Murray-Rust (and I) have often argued that permission barriers
must be removed. See e.g.
http://archiv.twoday.net/stories/4409408/
http://archiv.twoday.net/stories/4356023/ (and earlyer posts)
See also
MacCallum CJ (2007) When Is Open Access Not Open Access? PLoS Biol
5(10): e285 doi:10.1371/journal.pbio.0050285
On the recent discussion on textmining and PubMedCentral:
http://www.earlham.edu/~peters/fos/2008/04/text-mining-licensed-non-oa-literature.html
http://researchremix.wordpress.com/2008/04/07/non-oa-full-text-for-text-mining/
http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1026
Harnad writes: “OA is free online access. With that comes,
automatically, the individual capability of linking, reading,
downloading, storing, printing off, and data-mining (locally).”
“Data-mining (locally)” is nonsense. If I have to mine 1000 articles
and are allowed to download automatically 10 articles/day I have to
wait 100 days.
Harnad repeats his ideas as mantras. We can do the same:
FAIR USE IS NOT ENOUGH.
There are scholars and scientists outside the U.S. under more rigid
copyright regimes without Fair Use.
Let’s have a closer look on the German Copyright law:
http://www.gesetze-im-internet.de/urhg/__53.html
It is allowed to make copies for scholarly use if and only if
(i) there are good reasons
and
(ii) there is no commercial goal (“keinen gewerblichen Zwecken dient”).
In my humble opinion medical research in a pharma business is
(i) research according BBB
(ii) commercial.
A scientist in this company may according German law (since January 1, 2008) NOT
(i) make copies of scholarly articles (§ 53 Abs. 2 Nr. 2 UrhG) for scholarly use
(ii) data-mining.
On the problems of the new commercial clausula for universities
(“Drittmittelforschung”) see (in German) the position of the
Urheberrechtsbündnis:
http://www.dfn.de/fileadmin/3Beratung/Recht/Expertise-3-korb-urhg.pdf
§ 53 Abs. 2 Nr. 4 allows him making copies (of some articles in a
journal issue) on paper or for non-digital use only. Because data
mining needs digital use our German pharma scientist has only a chance
to mine the CC-BY subset of OA publications (most hybrid journals have
AFAIK CC-BY-NC).
(i) OA is important for all researchers (including commercial research).
(ii) Commercial medical research is important for world’s health problems.
(iii) Data-mining is a new scientific way to solve medical problems.
(iii) Business companies engaged in commercial research cannot and
will not afford journal licenses for large-scale data-mining.
(SCNR: How many people must die because an OA guru says “There is a
need to update BBB” and denies the need of re-use?)
There is a simple solution (I will repeat it because it is important
like a mantra):
* MAKE ALL RESEARCH RESULTS CC-BY
* MAKE ALL RESEARCH RESULTS CC-BY
* MAKE ALL RESEARCH RESULTS CC-BY

PMR: [the extension to data is:
* MAKE ALL RESEARCH RESULTS CC0 or PDDL
PMR: Klaus gives excellent arguments and the German copyright law is particularly compelling. No “green” label can override this whereas a CC-BY can. The idea of local datamining is – as Klaus says – nonsense (sorry Stevan). I have legitimate scientific reasons for downloading every chemistry paper ever published – I want to use OSCAR to check which published results are valid. I want to extract NMR spectra and asses their consistency. I want to plot the use of hazarous solvents against a timeline. etc. We can easily analyse 100,000 papers a day for this sort of thing – the only barrier is Closed access. Science is impoverished
As Peter Suber (see above) and others have made clear it is not a question of Green or Gold. They can be pursued at the same time. Many publishers do not yet realise the value of Gold publishing and when explained they become positive about it (I answered a question on this yesyterday – more later).
In haste

Posted in Uncategorized | 10 Comments

UKSG

Posted on April 9, 2008 by pm286

A few snippets from my 1.5 days at UK Serials Group meeting – a mixture of publishers, LIS and others. One scientist.
I don’t think that many at the meeting really had much idea what scientists actually do on a day to day basis. That’s a gross generalisation but there is too much talk of “end-users” (of journals). I’m not an end-user; I’m a scientist, author, reader. So I wanted to get across real-science in my talk (this morning). I tried to do this with articles from JOVE (http://www.jove.org) – the journal of visual experiments. It’s impressive – fully recorded videos of experiments (mainly bioscience) with very detailed procedures including precise quantities of chemicals. Unfortunately although the streaming video worked half an hour earlier it failed in the talk. But I plan for failure and it wasn’t serious.
I did manage to give a good showing to Andrew Walkingshaw’s movie of crystallography ( The geographic spread of (Open) crystallography) with a seven-year global timeline. Particular appropriate at a publishing meeting as it shows the rapid and inexorable change in publishing.
The theme today (I was one of 4 speakers) was that scholarly publishing cannot continue as it does today, with vested commercial interests making money from restricting access. It clearly touched a spot in several of the delegates who told me they were now committed to pushing for Open Access as a result of what they heard. And, although progress is slow, the ground is being laid. But whether the industry and community can actually move is not clear.
Besides the Open/Closed fracture line there’s a great danger of failing to provide what the new generation of scholars want. I would mandate that all LIS decision-making bodies had an undergraduate representative. It’s no good trying to work out what young people want by asking them questionnaires. Give them the power to change the process themselves. They are already redefining what the scholarly process is – they don’t do it in the way we would like, so we have to change, not them.
So I came away with general optimism – Open Access is certain (although there wasn’t actually very much said about it – a sort of unspoken feeling) – but a serious concern about the lack of direction in the more immediate future. There’s little sense of leadership – I’d like to see provosts and heads of libraries actively trying to aim for a radically different future. And taking risks. Copyright in scholarship must break soon – just as has happened in the music industry.

Posted in semanticWeb, Uncategorized | Leave a comment

CRIG winners

Another call for Open Data

Open Access Week – kudos to the Wellcome Trust

Open Access Week – thank you MRC

UKSG – final thoughts – the future is in your hands

Clarification on ACS/NIH policy

ACS and the NIH Public Access Policy: Clarification at Last (permalink)

A Wikipedia for data…

A better interpretation of "green" and "gold"

Open Access Week – Green is not enough

UKSG

Recent Posts

Recent Comments

Archives

Categories

Meta