Repositories: The ICE-man speaketh

Our group has a major commitment in Repositories and we’ve just had the annual beanfeast OpenRepositories 09 (OR09). Jim Downing went and I have been debriefing (gently) with him but it may need a Trip To The Panton to get a wider picture. Jim, of course, is a Repository star in his own right and at OR09 had plenty to show and tell. One of our projects is ICE-TheOREm which uses Peter Sefton’s ICE authoring tool and where we set up a joint project with Peter.

The ICE age is coming.

Embrace it. (This blog is now written with ICE and I am much more productive – at least in writing more words).

So Here are some of Peter’s impressions of OR09. There was clearly a major geek-fest going on. I’m sorry I had to miss it.

… Clifford Lynchs 2003 definition of a repository as a set of services:

In my view, a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. …. An institutional repository is not simply a fixed set of software and hardware.

http://www.arl.org/resources/pubs/br/br226/br226ir.shtml

I [PTS] agree. Its not a computer program, its a lifestyle; what Lynch is calling organizational commitment.

I had a little moment in the spotlight when keynote speaker John Willbanks referenced my Scholarly HTML idea. This was reported in Twitter thus:

akosavic Wilbanks at #or09: rename semantic web as scholarly HTML

So there you have it, meet the saviour of the semantic web. Move over Sir Tim.

Actually I wouldnt go that far what I am trying to get at with this Scholarly HTML is that the research article our unit of academic currency should be a web page, not a bit of pretend paper, a PDF. Journals need to be reinvented. Articles should be web pages (yes we need ways to time-stamp and version them). Peer review and editing are both important, but I can think of better ways to get those done than we typically use now. Then theres the idea of embedding machine readable semantics in the form of statements of fact, links to data etc, not to mention machine readable metadata. More on this soon here on the blog I think Ill write a series of papers on this, with appropriate collaborators, in the open then well see if we can get them to count as scholarly literature via peer review. A couple of people told me theyre watching the Scholarly HTML posts so I think Im onto something with this one.

PMR: I like the phrase scholarly HTML. Simple, accurate. (Of course for us geeks it’s XHTML, with embedded excitement such as chemistry, but I expect most schoolkids have now heard of HTML).

First up Jim Downing from Cambridge and I showed off the work we did with our teams on the ICE-TheOREM project. Not only were we able to show a thesis going onto the web in HTML as well as the dreaded PDF, it had granular chapter-level embargo, and we were fully buzzword compliant, with ORE and SWORD built in. And for the first time we made our work available as a ready-to run virtual machine, a few copies of which I handed out. Well definitely do more of that, and keep updating our machine with all the software we work with at the moment it runs ICE, ePrints and The Fascinator, but Id love to see DSpace and OJS and Moodle on there as well all integrated.

And there was the poster, which I supplemented with a metaphor a collection of 40mm & 50mm PVC waste pipe and various connectors. David Flanders used it to build a data grid which included a pipe going straight to repository hell a place he has apparently spent a fair bit of time drinking microbrew with too much malt. The idea was to drive the point that we want to make research data plumbing as easy as PCV pipe network engineering. Here I am spruiking the poster with a fistful of PVC.

graphics1

I attended Microsofts workshop on their growing set of academic tools. Ill reserve judgment on the new Zentity repostiory, but I am very, very pleased to see Microsoft Research working on academic workflows. As part of that, the idea of building HTML conversion into repository deposit tools is getting a serious airing in the repository community and Im very pleased about that. More after my visit to MS Research tomorrow. (I have already spent a little time in Seattle with Pablo Fernicola, wandering in the sculpture garden and talking in general about academic computing).

PMR: I’m pleased that PTS is pleased. I am impressed by MS’s commitment to covering the field and their conversion (in MS Research) to Open Source. The major problem is that Microsoft technology can be a barrier. There is an assumption that everyone has implemented the Microsoft technology stack (IIS, SQLServer, SharePoint, Active Directory, Team Foundation Server, .NET, etc.) Not all these are required for Zentity, but some are. And that’s a major barrier to uptake. My contacts in MSR are aware of this…

The final thing to mention is the Developer Challenge, run by the unstoppable David Flanders (I was going to say indefatigable but I cant spell that) with Rachael Rodenmayer assisting. This was judged by a team of five based on five minute screencasts. There were some good ideas in there, in a field that spanned entire repository developments that were already done to small prototypes. Read about the winners at the JISC site.

.

Conclusion

Good conference. Good venue apart from a lack of power points (no lack of PowerPoints unfortunately) in the main venue. I was pleased to be showing off HTML in ePrints at last, and have ICE in production as an eResearch workflow tool. I think the repository world is making a welcome move to (re)embracing the idea of small pieces loosely joined.

PMR: Peter thinks deeply about repositories and disseminates his thoughts on top of working software. Pay great attention.

Posted in Uncategorized | Leave a comment

Openness in the shadow of Wallace

I was one of the first staff at the University of Stirling (Scotland) in 1967 so I’m delighted to see that Stirling continues to forge ahead in Openness:

STORRE, the IR at the University of Stirling, recently passed the 1,000 item milestone. The repository’s managers attribute the growth to the university’s institutional mandate:

STORRE is a full text only repository that has been up and running at the University of Stirling since 2005. We focussed initially on eTheses, with the submission of eTheses to STORRE becoming mandatory in September 2006. …

In April 2008 the University’s ePrint Mandate was announced – this requires all Journal Articles submitted for publication since January 2007 to be deposited in STORRE immediately on acceptance for publication. …

Since the ePrint Mandate came into force in September 2008, the rate of submissions of items to STORRE has risen dramatically from less than 20 items per month (sometimes much less!) to around 120 items per month. …

Scotland has a burning national pride (you didn’t need me to tell you that) and one of the results is to act in a united manner. I remember two (three?) years ago at the Open Scholarship meeting in Glasgow the strong feeling that they were united on Openness and were ahead of England in this. I think they still are.

Posted in Uncategorized | Leave a comment

Open access and unacceptable behaviour

It’s particularly sad when someone you look up to falls from their pedestal.

Many of us in the Open Access world remember Ian Gibson (UK MP) and his campaign to push for OA through the offices of the Houses of Parliament. See, for example the BMC interview (http://www.biomedcentral.com/openaccess/archive/?page=features&issue=19).

Now Ian Gibson is one of the Mps most implicated in the unethical use of expenses (http://www.telegraph.co.uk/news/newstopics/mps-expenses/5364319/MPs-expenses-cover-up-of-Ian-Gibson-and-his-daughters-cut-price-flat-deal.html). You can read the sorry story for yourself.

Yes, MPs say, it was within the rules. But I expect Mps to act on ethical as well as legal principles. When Gibson criticised publishers, they were within the rules. But he wanted to change the rules. The lack of ethical principles leaves us stunned and more so for those who we saw being driven by ethical motives.

We are simply bewildered. We all know that politicians can be corrupt, lazy, scheming, etc. But we like (and need) to believe that most are honest and hardworking. We expect them to look out for perils ahead and to alert us. We expect them to indicate when others overstep the bounds of reasonable action. Now we can’t.

The only good thing that can come out of this is a radicalisation of democracy. Greater legitimisation of individual action and it’s here that the Net is so important. Let’s not throw the chance away.

Posted in Uncategorized | Leave a comment

COST Computational Chemistry Workflows

We are part of the COST D37 action on Computational chemistry and specifically the creation of workflows and interoperability through standard APIs and formats. Our group at Cambridge is providing the basic infrastructure which I hope I can say objectively is now looking believable and exciting. Effectively it covers:

  • The conversion of computational chemistry output to CML. Ideally this should be done with FoX which depends on the authors of the program (or the developers) incorporating this into their release programmes. CASTEP, Dalton, GULP, DL_POLY, MOPAC are some of the programs which have been FoXized, but not all the current versions include it. So we are likely to require an intermediate step:

  • explicit conversion of logfile and other output to CML. This can be done by ad-hoc scripts, but I have developed a more systematic method JUMBOMarker. This is a set of heuristics which include regular expressions, chunking and lookahead. It’s rather out of date so it’s getting a redesign and facelift. It should then be relatively easy to convert most outputs to CML. Each program requires a set of templates which describe the output, and these can be created by annotating typical program outputs. This has to be done by someone who understands what the program does and so I spent yeterday with Kurt Mikkelson working on understanding Dalton Kurt is an author.

  • An ontology for each code. The system can run without ontologies, but when these are created they add spectacular power to navigating and querying outputs. Every piece of information in the logfile can, in principle, be queried. These are currently bottom-up ontologies they describe the phenomenology of the program rather than its platonic nature and purpose. If we get ontologies for several programs, then we’ll be able to abstract commonality and create a mid-level ontology for Computational chemistry. This will then be able to transform the way CompChem is used for example the ontology can control the job creation and submission.

  • Conversion to RDF. This is automatic and uses one of the JUMBOConverter modules. The result can be a union of the relevant parts of the ontology and the actual data. This represents a leap forward in the management of future scientific data.

  • Ingestion into the Lensfield repository (Jim Downing, Nick Day, Lezan Hawizy). This is now automatic. The repository allows for ontology-based searching so that we can ask very general questions of the data sets using reasoning engines. The repository is general so that it can accommodate all aspects of molecules (provenance, literature, etc.)

We are going to run a combined project whereby we generate large numbers of related molecules in a parameter sweep (substituted oligo-acetylenes) and decorate them with donors and acceptors. Then compute properties such as the hyperpolarizability and see what combinations of substituents generate new effects. This type of high-throughput calculation can only be done by building workflows.

There are still problems to be overcome Kurt and Hans-Peter Luethi reckon that 10% of calculation behave pathologically but the workflow will allow us to apply machine-learning to determine what the factors might be.

In the evening we went out punting with an ad hoc picnic. Of that I shall only say one word:

SPLOSH!

No, that should actually be

SPLOSH SPLOSH.

If you are interested in more details, let us know

Posted in Uncategorized | Leave a comment

COST-HLA in Porto: Can you help with Maps and Data Journals

I blogged about my visit to Porto on Monday and I’m relaying appeals for help in two areas:

(a) One group wants to create an interactive map where they mashup immunogenetics data (genetic markers in human populations). The emphasis is on Europe. They are thinking of using Google maps, but I’m wondering whether Open Street Map is at a stage where it could create a totally open solution. I am gueesing that they wish to overlay maps with dots or coloured regions (so obviously natural administrative boundaries in Europe would be useful.)

Any help, including examples of where this technology has been used for whatever purpose would be useful.

(b) Another group is interested in publishing datasets in a data journal. They wish to make sure that the data are published at the same time as the study is published conventionally. They believe that the community would be interested in a journal which was peer-reviewed but primarily on data and metadata quality and that this would be one data set per paper.

This is close to the model of Acta Crystallographica E where each paper is a single crystal structure and where the data quality is the primary concern chemical interest is not a determining factor.

I believe that the data journal is coming and that this type of activity should be strongly encouraged. It should be very cost-effective and could lead to new metrics for scientific endeavour.

Posted in Uncategorized | 4 Comments

The European Internet will be free

I am delighted to have a reply from my MEP, Andrew Duff. This was after I used the WriteToThem site from mySociety.

This is a victory for Web Democracy well done everyone. I shall certainly be voting on June 4th you will recognize me from my twitter icon (@petermurrayrust.

He points out unknown to me that the UK is pursuing retrograde policies. Experts in politics will know what member states can do despite what the European Parliament does and vice versa. I will write to my UK MP, copying Andrew Duff’s answer.

From Andrew Duff:

Dear Mr Murray-Rust

My apologies if we failed earlier to respond to you on this issue. We
received a lot of correspondence, and I thought all had been answered, but
it looks like yours got missed.

I share your concern to resist the imposition of restrictions on internet
use.  The  current package contains four reports on the reform of the
regulatory framework for electronic communications- including mobile and
fixed telephones, broadcasting and the internet. This includes a report
concerning internet use which has proved controversial and to which you have
raised concern.

I am pleased to inform you that the committee voted to ensure that no
restrictions were imposed on the fundamental rights and freedoms of internet
users, without a prior ruling by the judicial authorities.

More broadly the package will also enhance competition and ensure that all
market players have sufficient incentives to invest in infrastructure and
services for consumers.

This report will now go forward to the Council of Ministers for their
decision. While the European Parliament has voted to protect the rights of
internet users, there is a risk that the amendments will get voted down by
the Council.

France has been championing the three-strikes legislations and is likely to
block these developments.  I am aware that the UK along with France is
pushing strongly to change the wording agreed by the Parliament.  This
action will delay the progress of the report and put at risk the many
important aspects in this package which will increase competition and boost
jobs.  I therefore urge you to raise your concerns with the UK Government
and ask them to support the article which ensures that an internet cut-off
can only be imposed after approval from a competent legal authority.

The internet plays an important role within society, both socially and
economically.  It provides a space for communication, to access information,
for technological development and for economic activity.

I hope that this provides you with clarification.  Thank you once again for
writing to me.

Andrew Duff

Posted in Uncategorized | 3 Comments

Panton Principles in Porto

I spent the day yesterday in Porto with the COST initiative on sharing HLA data (which exchanges data on human antigen typing, value for say pharmocogenomics or immunogenetics, including European migration). They’d invited three speakers 2 on ethics and me on Open Data. It was a great meeting and as always it’s great to see (17) European countries collaborating rather than bombing each other.

I took the opportunity to rehearse ideas of Open Data (as something orthogonal to ethics e.g. human privacy) that apply to any science. And I stressed we should JUST concentrate on science not a wider vision of knowledge or creative works.

When I give presentation like this I ask those present what their understanding of the background. There were about 25, and I asked how many present:

  • had published an Open Access paper (3)

  • had contributed to an Open Source program (1)

  • had heard of Open Street Map (1)

  • had used a Creative Commons licence (2)

I am not disappointed by these figures the group clearly wanted to do the best thing. They wanted to develop interchange standards. They wanted to build open databases. They wanted to create data-only journals. They wanted to mashup population genetics on interactive maps.

I therefore gave a presentation which stressed simplicity. That’s what scientists want. So I presented the Panton Principles (and I am DELIGHTED to see John Wilbanks giving the idea full support):

The idea now is to rework the simple statements such that they are trivially understanding the principles (like anyone can understand the Budapest Open Access declaration immediately). The following words should not occur anywhere:

  • licence

  • contract

  • share-alike

  • public domain (because no one know what it means)

Simply, the Principles should state that scientists wish to donate their data to the world community for any purpose and with no requirement other than attribution. That further use in a domain is regulated by the Community Norms in the domain (which will vary widely). That funders should mandate this. That anyone who offends against the spirit of this (as it is the spirit, not the letter) will have to answer the court of Community Opinion in their domain. That there will be a simple act of stating this intention in an electronic document, hopefully provided by software.

That’s all. We have to craft some words but they should be simple enough to fit in a single paragraph. It won’t be easy but John Rufus and others have been discussing this for many months. If you have an insight join the discussion.

Posted in Uncategorized | 5 Comments

Are these images copyrightable?

I contend that almost all images in scientific publications should NOT be copyrightable by any publisher and should be stamped as Open Data by the author. To give an idea I have extracted some images from BMC journlas (which I can do without permission as BMC is a CC-BY publisher). I’d like to know if anyone thinks any of the following should be copyrightable.

Remember that an image copyrighted by a publisher requires explicit permission (e.g. emails and often the payment of money). Do any of these deserve that?

I’ll give the URL and then one or more images. Because it’s triccky to paste into the correct place, please be forgiving with the formatting:

==================================================

http://www.biomedcentral.com/content/pdf/1471-2105-10-146.pdf

graphics1graphics2and

graphics3http://www.biomedcentral.com/content/pdf/1471-2105-10-145.pdf

http://www.biomedcentral.com/content/pdf/1471-2342-9-8.pdf (I think this is a brain scan)

graphics4

http://www.biomedcentral.com/content/pdf/1471-2164-10-224.pdf (Note the diagram is drawn by machine)

graphics5

graphics6http://www.biomedcentral.com/content/pdf/1472-6807-9-29.pdf

This is creative in that the authors have certainly drawn this, but it’s an essential part of explaining their methodology. It should not be copyright.

http://breast-cancer-research.com/content/pdf/bcr2258.pdf. These are very beautiful images, but they are raw data and absolutely essential to communicate the science and should not be copyrighted.

graphics7

http://www.translational-medicine.com/content/pdf/1479-5876-7-34.pdf

graphics8Gels are raw data and the only meaningful method of communication is an image. Should never be copyrighted

graphics9

This is a much better way to communicate methodology than dense prose. It should not be copyright. From: http://www.jissn.com/content/pdf/1550-2783-6-11.pdf

http://www.biomedcentral.com/content/pdf/1471-227x-9-9.pdf is a comparison of laryngoscopes. So it’s essential to publish pictures of them that anyone can anlyse without permission.

graphics10

graphics11

http://www.journal.chemistrycentral.com/content/3/1/5 Chemistry must never by copyrighted:

Posted in Uncategorized | Leave a comment

More on Wolfram and chemistry

This is a lighthearted romp while I am watching the cricket…

Sulfur chloride:

graphics1

WA knows that it’s ambiguous that could be lookup but the IUPAC name ?????. I want to see Daniel’s expression when he sees it he has a splendid way of indicating when he feels a name or structure violates the rules. I couldn’t find the IUPAC name in Pubchem or elsewhere so maybe Wolfram is generating it. If so, it’s going to have to encode an awful lot of rules.

Pentachloromethane (non-chemists this doesn’t exist carbon has only 4 valencies). However WA makes a brave (and completely wrong) guess:

graphics2

So it looks like it has some sort of natural language engine. I’m not sure that’s a good thing to mix with algorithmic reasoning

It gets phosphorus trichloride and phosphorus pentachloride right, so I tried a non-existent compound.

graphics3

graphics4We should remember that this is an Alpha, so it will have mistakes. Since it’s Closed we have to use these sorts of tricks to try to find out how it thinks. I’ll finish with our old friend:

This is internally inconsistent. The formula doesn’t match the Structure Diagram. It’s carefully put brackets into the name and then interpreted it wrongly. Maybe it’s a clever algorithm which just needs tuning or maybe it’s a fuzzy approach which is struggling.

However they have clearly paid people to put data in. I wonder where from?

Posted in Uncategorized | 2 Comments

Can Wolfram Alpha Beat OPSIN?

The chemical blogosphere has started to have fun with Wolfram Alpha (http://wwmm.ch.cam.ac.uk/blogs/adams/?p=269 and links from it), so here’s my contribution. Is the following picture a (bad) lookup? Is this an attempt to parse the name? After all a single character m shouldn’t matter to the greatest brain ever.

graphics1

(For non-chemists it’s got the structure of dibromoMethane instead of dibromoEthane).

I think we could bet on OPSIN beating WA in a contest.

Posted in Uncategorized | 3 Comments