petermr's blog

A Scientist and the Web

 

Archive for April, 2008

More on Open Data for molecules

Sunday, April 27th, 2008
Antony Williams (Chemspider) has engaged in a useful discussion about the various aspects of access to scientific data: Acting as a Community Member to Help Open Access Authors and Publishers
There are some valuable points merging where we start to be clear what we agree on and where we differ. (I am also aware that there are some implications for the definition of Open Access which I cannot comment on for a few days):

(AW) Recently I posted about our intention to post the full Molbank articles on ChemSpider. PMR commented on my potential over-extension of their Open Access nature: “PMR: I also support publishers who make their material available. I don’t want to appear churlish, but Molbank use what is effectively a NC (non-commercial) license and this is what concerned me (and others) when I posted about 1 year ago. I don’t think it has changed. So sorry, Antony, it’s not “as Open Access as they can be” especially if one has to ask permission to mount the material.” He may be right.

PMR: If we take the public definitions of Open Access (often called “BBB”) then I think all the OA community will agree that NC is not as Open as possible. I’ll be writing more in a few days…

What I do know is that I prefer to get into relationship with the groups/people I work with in the community. Simply grabbing their content/data without some connection doesn’t feel comfortable. AND, I realize in these days of search engines and scraping that’s quite acceptable.

PMR: This is a very valuable point. Yes, simply taking material without asking (and having it taken) “doesn’t feel comfortable”. I’ve felt that on several occasions – when I have exposed my/Open Source (sic) BO code (e.g. in Bioclipse) and seen it taken and developed for commercial use I’ve felt agitated. But I know this is part of the process and if you can’t live with Open approaches then you take a different approach. I have also been careful on occasion not to take Open material and carry out bulk download and transformation without alerting the owner (and there normally is an owner or a proxy owner). For example we have long and continuing discussions with the International Union of Crystallography, Royal Society of Chemistry and the American Chemical Society. IUCr’s Acta Crystallographica E is now Open Source. In principle we could download all the papers and mount them on our site. We could even sell them. But we wouldn’t do that (although others might). And there could be a trademark issue even though there wasn’t a licence issue – we’d need to make sure we weren’t purporting to be the defintive site. Note that much of this issue springs from current practice in chemistry. In bioscience it is common and often de rigeur to make your data available to the databanks. The issue of “ownership” of sequences and genes has been publicly fought over 3 decades and I doubt that authors would be values if they added non-commercial (NC) tags to their data.

When I approached MDPI, the publishers of Molbank, they were gracious in their willingness to have ChemSpider support, integrate and utilize their content. This is contrary to some of my experiences with some other advocates of Open Data and Open Access where trying to get their “Open Data” is like pulling teeth. MDPI appear to be the opposite, in my experience.

PMR: If something is Open Data or open Access then you have a legal right to download it. You don’t have to ask permission. I can’t understand what you are talking about unless it’s a veiled reference to the technical issues in downloading CrystalEye data (see below). If it’s anything else then please let me know and I’ll take it up with them.

I commented on Peter’s blog tonight: “Regarding your comment “especially if one has to ask permission to mount the material.” I think that’s a comment on the fact that I asked permission? I asked permission for the reason that I am focused on building a community for chemists and this includes me staying in relationship with publishers. I think you know this about me from my previous comments about CrystalEye

PMR: I am also interested in building a community for chemists (Blue Obelisk). We also mount data under various data and code licences. Anyone can download it without our permission.

“http://www.chemspider.com/blog/intention-to-scrape-crystaleye-content-and-staying-in-relationship-with-publishers.html”

AW: I judge its a better way to Build the Structure Centric Community for Chemists on ChemSpider.

PMR: That’s great. I think Chemspider has moved a long way and is fulfilling a useful role. Alone of the chemical aggregators you have taken on board the issues of Open Data – as a result of engagement with the OA community. You have a commercial site which uses Openness as part of it’s business model – that’ts fine – it’s part of Web 2.0. I have a different agenda – it’s not incompatible – just different.

CS: So, while I didn’t have to ask for permission, I did. the result was an excellent exchange, newfound relationships and an opportunity to build an enhanced relationship WITH support and permission.

PMR: Again no problem. But the scale of the problem means that it is impossible for individuals to engage with all publishers. First many of them simply fail to answer (this is one of the main problems – many publishers are simply not trying). Even in chemistry there are ca 60 “Open Access” journals. We started going through them and found that the scale was simply too large. However we did get some progress – for example Libertas Academica had a NC licence – we pointed this out – and they immediately changed with enthusiasm. I did the same with Molbank and they made it clear that they understood the issue and didn’t want to change it. We know where we are.

PMR: You want Molbank’s data – fine. You and then are happy to share it. I imagine you can negotiate how it can be used in your data base. What I assume you can’t do is then to release the data as Open. So, for example, I cannot take data from Chemspider or Molbank and put it in a repository where it can be used for commercial purposes. By contrast CrystalEye can be used for commercial purposes.

PMR: a major aspect of this is scale. You’ve spent 6 months in discussion with one publisher (ACS) about their supplemental data (supporting info). They are not prepared to assert that you can use it and redistribute it. In contrast I have taken it on the assumption that I am legally allowed to use it and post it as Open Data. You don’t feel this enables you to re-use my data, so you see immediately the effect of the anticommons (Open Source and the Tragedy of the Lurkers). If data are OpenData (CC-BY or CC0) then these problems disappear immediately. Individual negotiations between parties don’t scale, Open data does.

CS: Many bloggers it appears assume that “concerned parties” read their blogs. For example, when you posted this: http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1048 did you make the editors at Molbank aware of the error or did you just scrape their content and blog?

PMR: No I didn’t and I don’t intend to. There is at least one error in almost every chemical paper I read. The same is true for biology. The quality of the actual publication of data in science is almost universally poor. There are communities (crystallography, thermochemistry, astrophysics, protein sequence, atmopsherics, etc.) where there are some excellent initiatives which chemistry would do well to emulate. But they are almost all initiated by the community and often by no means welcomed by the publishers

CS: I have adopted a new approach of late – when I see issues with peoples data, websites etc I inform them directly to help them clean up errors. I’ve done this for Drugbank, PubChem, a number of blogsites, and so on. In case you didn’t inform them I will send them your blog link tonight…also to the original author since I’m sure they will appreciate it too. This, I believe, is being a member of the community and since the authors and the publishers are taking actions to contribute to the Open Access community it’s part of my personal charge to help.” I have sent an email to the original author and to the MDPI editors with the hope they might clean the article or post an Erratum. This is what I feel is appropriate as an active member of the community. If you see errors on ChemSpider please do let us know directly. We have a “Add: Feedback” on every record page and do pay attention to your input.

PMR: This is a potentially useful approach – I don’t know whether it scales and how you measure the quality. We also are intrisucing a feature on CrystalEye where we use Connotea (and, yes, we’ve discussed it with friends at Nature) as an annotation mechanism. Note that we could have done this without consulting Nature but it is possible that it would swamp Connotea and we didn’t want to do that.

PMR: BTW you often use the word “curating” for your activities and I suggest you really mean “annotation”. Digital Curation is about preservation of material and the historical record, not making assertions (however valid) about “correctness”. (Chris Rusbridge reads this blog and I’m sure he’ll be happy to engage with you).

PMR: I’ve written enough so I will tackle your concerns about CrystalEye later. Please let me know if that is your only concern or there are other sites. Note, to start with, that it isn’t trivial to make data sets easily available whatever the motivation. Open Data states:

As set out in the Open Knowledge Definition, knowledge is open if “one is free to use, reuse, and redistribute it without legal, social or technological restriction.”

There are no such restrictions on CrystalEye. It may not be in the form you would like it, and I’ll address that later. But you suggest – wrongly – that I and others deliberately make it difficult to access the data. That is not true.

Liquid European publications

Saturday, April 26th, 2008

From john wilbanks’ blog

  • Call for Postdocs – Enhanced scientific publications

    If you’ve got an interest in next-generation publishing in science, and you’ve always wanted to live in Paris…I’ve got a job opening you might be interested in after the jump. Please forward this far and wide. It’s a great project. If I were younger – and had a doctorate so that I could be a postdoc – I’d be all over this one.

’ve contacted you some time ago about an EC project that starts this year on the design of new models of publication for the academy. (details at: www.project.liquidpub.org ). The profile is for someone who has an interest in the design of new knowledge objects, some programming skills and a more general interest in open source.
Any help in diffusing this message to the relevant people or lists is more
than welcome.

To whom it may concern; and apologies for multiple postings!

’LiquidPublication’ Post-doctoral researcher Innovating the Scientific Knowledge Object Lifecycle Institut Jean Nicod (CNRS, EHSS, ENS), Paris -under the responsibility of Gloria Origgi and Roberto Casati
Candidates are invited to submit an application (in English) including a
detailed curriculum vitae, a list of publications, a statement of interest,
and two letters of recommendation. The application should be sent directly both to Gloria Origgi at origgi@ehess.fr and to Roberto Casati at casati@ehess.fr .

We are seeking to recruit a post-doctoral researcher as part of an
international project entitled LiquidPublication. Funded by the European Commission, the project will bring together a highly interdisciplinary team of researchers and experts in order to explore how ICT and the lessons learned from software engineering and the social Web can be applied to provide a radical paradigm shift in the way scientific knowledge is created, disseminated, evaluated, and maintained. The goal to exploit the novel technologies to enable a transition of the “scientific paper” from its traditional “solid” form, (i.e., a crystallization in space and time of a scientific knowledge artifact) to a Liquid Publication (or LiquidPub for short), that can take multiple shapes, evolves continuously in time, and is enriched by multiple sources. We call these new, dynamic objects, Scientific
Knowledge Objects
(SKO). More details on the project and its partners are available at: http://project.liquidpub.org/

PMR:I agree with John – this is one of the several ways in which we need to explore a new generation of publishing. We aren’t seeing nearly enough innovation coming from publishers – we need new visions. One of the benefits of European funding is that there is a commitment to the knowledge economy and a realisation that  this is critical to Europe’s future well-being.

Although the description of this post is quite high-level it would be nice if it encompassed work on semantic scientific objects (datuments).

Why we need semantic chemical authoring

Friday, April 25th, 2008

Since I have recently posted about Molbank (an Open Access journal of chemical structures and syntheses) I started to have a look at some articles. I should emphasize that I am generally in favour of what Molbank is doing (making information freely available I shall discuss exactly how in a week or so.) But the first article I encountered had a feature that even a non-chemist could spot. I have posted the first part of the article after my comments. Note that I can do this as Molbank allows non-commercial use, whereas some commercial publishers would already be reaching for the lawyers… And please do not take this as a criticism of Open Access – almost all chemical papers that we review using software show one or more detectable and avoidable errors. Take it in fact as a compliment to Openness in that Open papers are exposed, are open to criticism in a way that closed access are not.

My main argument is that if we had semantic authoring tools – this type of error could not occur (or would at least prompt the author):

===================================================================

Molbank 2006, M478

http://www.mdpi.org/molbank/

Microwave assisted esterification using Fe2(SO4)3.4H2O/concentrated H2SO4 as efficient catalyst

Krunal G. Desai1,*, Kishor R. Desai1 and D. Padmanabhan2

1Department of Chemistry, Synthetic Organic Chemistry Research Laboratory, Veer Narmad South Gujarat University, Surat-395 007 (Gujarat), India.

Tel: (0261) 2258384, Fax: (0261) 2258384

2Board of Radiation and Isotope Technology, Vashi Complex, Navi Mumbai-400 705 (Maharastra), India.

e-mail: kgdapril@yahoo.co.in

*Author to whom correspondence should be addressed

Received: 27 June 2005 / Accepted: 6 September 2005 / Published: 31 March 2006

Keywords: esterification, Fe2(SO4)3.4H2O/con.H2SO4 as catalyst, microwave effect

===============================================

“PDF should be used to preserve information for the future”

Friday, April 25th, 2008

(From: Carol Jackson [..email ..] [via Jim Downing]
Subject: Latest DPC Technology Watch Report – ‘PDF should be used to preserve information for the future’. To: DIGITAL-PRESERVATION@jiscmail.ac.uk)

PDF should be used to preserve information for the future

Good news the already popular PDF file format adopted by consumers and business alike is one of the most logical formats to preserve today’s electronic information for tomorrow.

According to the latest report released today by the Digital Preservation Coalition (DPC), Portable Document Format (PDF) is one of the best file formats to preserve electronic documents and ensure their survival for the future. This announcement will allow information officers to follow a standardised approach for preserving electronic documents.

Information management and long–term preservation are major issues facing consumers and businesses in the 21st Century. This report is one of a series where The Digital Preservation Coalition (DPC) aims to think about and address the challenges facing us.

This report reviews PDF and the newly introduced PDF/Archive (PDF/A) format as a potential solution to the problem of long–term digital preservation. It suggests adopting PDF/A for archiving electronic documents’ as the standard will help preservation and retrieval in the future. It concludes that it can only be done when combined with a comprehensive records management programme and formally established records procedures.

Betsy Fanning, author of the report and director of standards at AIIM, comments, “A standardised approach to preserving electronic documents would be a welcome development for organisations. Without this we could be walking blindly into a digital black hole.”

The National Archives works closely with the DPC with issues surrounding digital preservation and will continue to do so. Adrian Brown, head of digital preservation at The National Archives said: “This report highlights the challenges we all face in a digital age. Using PDF/A as a standard will help information officers ensure that key business data survives. But it should never be viewed as the Holy Grail. It is merely a tool in the armoury of a well thought out records management policy. “

The report is a call to action, organisations need to act now and look hard at their information policies and procedures to anticipate the demand for their content (documents and records) in the future. Everybody has different criteria, types and uses for documentation so you need to find one that works for your organisation.

If you would like to read the full report please go to the Digital Preservation Coalition website. This can be accessed here: www.dpconline.org/graphics/reports/index.html#twr0802

PMR: I am not an expert in digital curation and am reluctant to criticize a body devoted to it. I am sure that they know in great detail how difficult it is to extract information from PDF, whatever the version. We’ve been looking at theses – bitmapped, born digital etc. and PDF is vastly more difficult than Word for information extraction. Vastly. Our programs such as OSCAR can read documents in Word but lose much of the information when they try to read PDF.

So yes, I can see that PDF is useful for preservation. Whether it’s better than XML I doubt. I’d like to see the argument. Whether PDF is any use after it’s been preserved is much less clear. Yes, if the document is pored over by human scholars. We’d hate to lose Shakespeare or similar.

But there are 1,000,000 scientific articles per year (give or take a bit). 15 million abstracts in Pubmed. Assuming they are preserved in PDF, how can we currently make full sense of them? If they were also in XML, HTML, Word, or LaTeX we’d be able to index them. Not that we cannot index PDF at all, it’s just that we lose much more information.

So I’m not arguing that PDF shouldn’t be used. But please please use a semantic format as well. And think about re-use as well as preservation.

Open Scholarly Communities on the Web

Friday, April 25th, 2008
While I’m waiting for Jim to help fix my Eclipse environment, here’s a post from the Open Knowledge blog…

Dr. Paolo D’Iorio recently invited me to attend the first meeting of an EU funded Working Group “devoted to analyzing the current debate on the legal, economic and social conditions for setting-up open scholarly communities on the web”. The meeting was part of COST:

COST – European Cooperation in the field of Scientific and Technical Research – is one of the longest-running European instruments supporting cooperation among scientists and researchers across Europe. COST is also the first and widest European intergovernmental network for coordination of nationally funded research activities.

Action 32, of which Dr. D’Iorio is Chair, is called “Open Scholarly Communities on the Web” and has two aims:

  • to create a digital infrastructure for collaborative humanities research on the Web; and
  • to establish and foster the growth of Scholarly Communities that will provide feedback to the IT developers regarding the needs and expectations of humanities researchers and will serve as a core group of early adopters.

Talks included:

  • Paolo D’Iorio (CNRS-ITEM, Paris), How to build a Scholarly Community on the Web
  • Maria Chiara Pievatolo (University of Pisa), Copyright in Europe. History and perspectives
  • Thomas Margoni (University of Trento), How to access primary sources in Europe. The legal framework
  • Annaïg Mahé (URFIST, Paris), The market for SSH Journals in Europe
  • Jennie Grimshaw (British Library), Negotiating spaghetti junction: legal constraints on archiving government e-documents in the UK
  • Christine Madsen (OII, Oxford), The significance of “marketing” digital collections: the case of Harvard
  • Yann Moulier Boutang (Professeur de sciences Economiques – Université de Technologie de Compiègne, Directeur adjoint de Laboratoire de l’Unité de Recherche EA 22 23), Economic model(s) of Scholarly Communities: Open Source or Creative commons?
  • Francesca Di Donato (University of Pisa), The evaluation of science. From peer review to open peer review
  • Eric Meyer and Ralph Schroeder (OII, Oxford), Open Access and Online Visibility in the Age of e-Research

(JG) Notes and comments

  • For many humanities subjects, having something like the public domain calculators would help to facilitate the growth of open resources for scholarly communities built on works in which the copyright has expired.
  • Paolo’s presentation of Nietzsche Source and the Discovery project gave a compelling vision of how communities might grow around a resource for corpus based scholarship – with users having their own virtual workspace with annotations and notes that could be shared with other users. The ‘Scholarsource’ system would have stable URLs to support accurate citation, and robust ontologies to facilitate exploration of the material. Licensing that permits re-distribution is also a good preservation strategy.
  • The term ‘open’ was often not used in the sense of the Open Knowledge Definition. Several projects used licenses with non-commercial restrictions. While some participants assumed that scholars and institutions would often prefer that their work was not exploited commercially – it would be great if public domain sources such as documents, images and records, could be published under an open license. An approach which recommended open licensing for material that had not been enhanced (scans, text files …) could help to stimulate the growth of a commons that would encourage greater experimentation and collaboration than one which restricted certain kinds of re-use (cf. 7. and 8. in the OKD).
  • The importance of a close working relationship between scholarly communities and technologists. It is crucial that technical development is informed by the needs and working practices of researchers. This is something we’ve been thinking about in relation to Open Shakespeare and Open Milton. Open licensing allows developers to experiment with scholarly material to develop new tools and applications that could be of unanticipated value (e.g. semantic approaches, text analysis or visualisation).
  • Legal, technological and social obstacles to building open scholarly communities. We have various legal mechanisms and emerging technologies to facilitate such communities. Sometime the most hard parts are social – in growing user base, increasing participation and so on. Value and limits of ‘build it and they will come’ approach.
  • PMR: Nothing much to add except that we are part of COST (in a completely different area – computational chemistry) – so it’s great to see them being involved in promoting Open Scholarship. Also the post shows how critical it is to have a clear definition of “Open”. Without it people don’t know what they are not providing by default. Sometimes, of course, there are good reasons for non-commercial (NC). I’ve felt them myself. But as the digital age expanfds thre is simply so much good stuff which would benefit from being free – and often where the costs of stopping it are absurbly high.

    So I’d like to see “Arguments against non-commercial”.

    Semantic markup and text-mining

    Friday, April 25th, 2008

    Here’s an interesting development which emphasizes texmining but actually seems to be a form of semantic authoring. (via Peter Suber)…

    Structured Digital Abstracts – Easier Literature Searching But Not Democratic

    <!–
    google_ad_client = “pub-2914975009096200″;
    //bottom link unit
    google_ad_slot = “3498866735″;
    google_ad_width = 468;
    google_ad_height = 15;
    //–>

    FEBS Letters is this month carrying out an interesting experiment that could make literature searching easier for both human and computers.

    The experiment centres on Structured Digital Abstracts (SDA). SDA are extensions of the normal journal article abstracts that describe the relationship between two biological entities, mentioning the method used to study the relationship. Each sentence is preceded by one or more identifiers pointing to the corresponding database entries that contain the full details of the interaction e.g. protein A interacts with protein B, by method X.

    The aim of SDA is to assist data entry, text mining and literature searching by extracting the salient data from the article into simple sentences using a defined structure and controlled vocabularies.

    Gianni Cesareni, Editor of FEBS Letters explains:

    Many articles in biological journals describe relationships between entities (genes, proteins, etc.) yet this information cannot be efficiently used because of difficulties in retrieving from text. Databases capture this valuable information and organize it in a structured format ready for automatic analysis. The experiment of using SDAs will facilitate database entry and improve disclosure, to the benefit of authors and readers.

    This month’s edition FEBS letters contains a number of articles annotated with SDA, along with some articles on SDA itself.

    This is a simple but very good idea and I would certainly appreciate anything that makes literature searching easier.

    But I can’t help noting the delicious irony in the title of the first article in the issue that trumpets the arrival of SDA: “Finally: The digital, democratic age of scientific abstracts”.

    The first irony is that reading this article on digital democracy requires a subscription to FEBS Letters.

    The second irony is that while SDA make it easier to find articles of interest, reading the original article also requires FEBS Letters subscription, effectively making them marketing tools for the journal.

    So useful they may be, “digital” they may also be but “democratic” they are certainly not.

    Wouldn’t the flow of information be better served if everyone just published in open access journals?

    PMR: Obviously I agree with the comments on Openness and “democracy”. I’d comment in general that  efforts by closed access publishers to “dumb-down” the exposure of information because of business processes is likely to be counterproductive. For example the Nature inititiative on text-mining  (OTMI) chops an article into words and snippets (mainly sentences) and wraps them in XML.  [The "Open" does not mean that the information is open, nor that the governance or process is open, but that the DTD is published]. I have many good things to say about Nature but OTMI is not what we need. Text-miners need to know the full text, because the usage at different points may be context-dependent. And it is so clear that the inadequacy is driven by commercial considerations rather than technical ones.

    I haven’t got into work so haven’t yet looked at examples of FEBS but the description seems to be of semantic markup provided by the journal. This, in general, is a highly desirable process. A paper on the “Indian hedgehog gene” suggests spiky animals roaming round the Taj Mahal – actually I believe it’s a drosophila gene (signified by the unique code Ihh). So it’s enormously valuable to have this markup. In collaboration with our laboratory the Royal Society of Chemistry has pioneered this in its Project Prospect where the molecules and some other concepts are marked up and hyperlinked to ontologies.

    And as FEBS says you need structure and controlled vocabularies. That’s a credit to the hard work done over many years by the gene annotation community and other bioscientists who have built Gene Ontology, ChEBI and other resources.

    We don’t have that in chemistry. Why not? readers of this blog will know – it’s almost completely due to restrictive business practices. Authority-based identifiers (CAS, Elsevier) for chemicals exist – they just aren’t open. So how about it, authorities?
    My vision is that this markup should not be done by the journal or the reviewers or by machines but by the authors. They are the ones who know what an Indian hedgehog is (at least we hope so). Of course the journals can check. What we need are semantic authoring tools.

    This isn’t as far away as it may seem. Modern authoring tools such as Word2007, Open Office and LaTeX now allow much customisation. The authorship – that’s you – is getting much smarter. We’re used to plugins. So semantic plugins that – say – scan the text for Indian hedgehogs and add “Ill” wouldn’t be difficult. Given that most people use Word, this is a good place to start. (I’ll be writing more about this over the next few weeks, particularly about chemistry). For braver adventurers we recommend Open Office and in particular the ICE plugin/framework from Peter Sefton who we are working with.

    So let’s go straight for semantic authoring. A major benefit will be that the author controls the process. And the markup should then be free and Open.

    Molbank and Open Access

    Thursday, April 24th, 2008

    Antony Williams writes:

    ChemSpider Will Soon Support Open Access Articles On the Site

    Posted by: Antony Williams in Open Access Publishing Copyright©2008 Antony Williams

    Some of you may be aware of the Molbank Open Access Journal. I recently blogged about our dedicated website for this Open Access Journal described here. Murray-Rust has discussed MDPI journals previously and their nature of Open Access. I am happy to validate that they are as Open Access as they can be. They have given us the right to mirror their articles on our site and in the next few weeks we will do exactly that, host Molbank articles connected directly to the chemical structures. Watch this space for our exapanding integrations with Open Access publishers.

    PMR: I also support publishers who make their material available. I don’t want to appear churlish, but Molbank use what is effectively a NC (non-commercial) license and this is what concerned me (and others) when I posted about 1 year ago. I don’t think it has changed. So sorry, Antony, it’s not “as Open Access as they can be” especially if one has to ask permission to mount the material.

    I shall be writing more about what Open Access is and isn’t quite shortly, but cannot do so at the moment – please accept this. It’s important that we address the concept more precisely than we have been doing up to now.

    CRIG – JISC, etc.

    Thursday, April 24th, 2008

    I lauded the victors of the CRIG competition: (CRIG winners) – and they have commented:

    1. Dave Tarrant Says:
      April 17th, 2008 at 9:04 am eWe look forward to it :P

      The thing which was remarkable about winning was the fact that we actually did code most of it whilst at OR08, which fulfilled the specification of the fact it was a developers challenge not an award to the best project which has been running for a while. So not sure if an extension of the project as an entry for next year will be valid.

      We’ll have to think of something else :P

    PMR: I’ve talked with Jim and Peter Sefton about this and I hadn’t realised that they had actually done the whole thing at the meeting, missing the talks, etc.. In any other sphere of life this would be incredible, but in hackerdom the sights are set very high so this is merely amazing.  Also I gather that it’s based on ORE which is also fantastic as it means that ORE can be made to work and do really exciting things. We are hoping to build on this – more will be public later.

    This is an effective use of funding. It’s inexpensive (< 5E-05 of the black hole left with STFC, for example) and it generated a lot of worthwhile ideas and demonstrators even if they couldn’t all win. It highlights the importance of writing code – often more important than a report which concludes that “more work needs to be done in this area”).

    So kudos to JISC as well as the winners.

    CRIG winners

    Wednesday, April 16th, 2008

    I’m delighted to congratulate the winners of the CRIG (Common Repositories Interoperability Group) competition at OR08. This was an innovative piece of funding – instead of giving a small grant to a group to do a small piece of work JISC announced a prize for the best on-the-spot development in this subject to be presented at OR08. Teams of developers would spend 1-2 evenings at OR08 creating prototypes instead of spending the time in the bar. (Or combining these activities). It is surprising and exciting how much can be done in a day or so. Modern tools help, and of course the Open architecture means that people can borrow ideas and technology from elsewhere.

    There were about 20 teams and Jim and I entered. All teams got T-shirts. Unfortunately I was grounded at Ansterdam for a day and missed the spot – so we withdrew. Here’s the winners… ECS developers win $5000 repository challenge

    The challenge winners

    The challenge winners

    Developers from ECS, Southampton, and Oxford University won a $5000 challenge competition which took place at the OR08 Open Repositories international conference.

    Dave Tarrant, Tim Brody (Southampton) and Ben O’Steen (Oxford), beat a large field of contenders, including finalists from the USA and Australia, by demonstrating that digital data can be moved easily between storage sites running different software while remaining accessible to users (watch video). This approach has important implications for data management and preservation on the Web.

    Repository sites have become a global phenomenon in higher education and research as a growing number of institutions collect digital information and make it accessible on the Web. There are now over 1000 repositories worldwide.

    However, with the growth of institutional repositories alongside subject-based repositories, and in cases where multiple-authors of a paper belong to different institutions, it is important to be able to share and copy content between repositories.

    Meanwhile the repository space has become characterised by many types of repository software – DSpace, EPrints and Fedora are the most widely used open source repository software – containing many different types of content, including texts, multimedia and interactive teaching materials. So although sharing content and making it widely available (interoperability) has always been a driver for repository development, actually moving content on a large scale between repositories and providing access from all sources is not easy.

    The OR08 challenge, set by the Common Repository Interfaces Group (CRIG), had just one rule for the competition: the prototype created had to utilise two different ‘repository’ platforms.

    The winning demonstrator showed data being copied simply from an EPrints repository to a Fedora repository, and then moved back in the other direction. What was striking is that among repository softwares, EPrints and Fedora are seen as being quite different in the way they handle data, so the approach used is likely to be just as useful with other repository softwarel.

    This data transfer was achieved using an emerging framework known as Object Reuse and Exchange (ORE), a topic that attracted one of the highest attendances at OR08. ORE is yet to appear in beta form, but specifications are being developed that allow distributed repositories to exchange information about their digital contents.

    According to Dave Tarrant, ‘Interoperability is the innovation. We think it is a bad idea to reinvent the wheel so with the availability and support for ORE growing, this provides a very suitable technology to provide interoperability between repositories.’

    The winning team are past and present members of the JISC Preserv 2 project that is investigating the provision of preservation services for institutional repositories, and will take this work forward in the project.

    PMR: It looks like magic. I will have to find out details from Jim. If it really is magic then we can expect to see a quantum leap in the power of distributed semantic information.

    (I don’t think we would have won even if we’d been present. In fact we certainly wouldn’t. But I’v ekept the T-shirt and there is OR09 when we expect to be unstoppable).

    Another call for Open Data

    Wednesday, April 16th, 2008