OOXML and ODT/F; should we work with commercial tools?

Glyn Moody has taken me to task for espousing Word as a tool that should be considered for archival of scholarly output. Not Word alone, but as a supplement to PDF. I explained my reasons and motivation. Now Glyn has continued the discussion and I reply…

A Word in Your Ear

A little while back I gave Peter Murray-Rust a hard time for daring to suggest that OOXML might be acceptable for archiving purposes. Here’s his response to that lambasting:
My point is that – at present – we have few alternatives. Authors use Word or LaTeX. We can try to change them – and Peter Sefton (and we) are trying to do this with the ICE system. But realistically we aren’t going to change them any time soon. My point was that if the authors deposit Word we can do something with it which we cannot do anything with PDF. It may be horrible, but it’s less horrible than PDF. And it exists.

There are two issues here. The second concerns translators between OOXML and ODF. Although in theory that’s a good solution, in practice, it’s not, because the translators don’t work very well. They are essentially a Microsoft fig-leaf so that it can claim using OOXML isn’t a barrier to exporting it elsewhere. They probably won’t ever work very well because of the proprietary nature of the OOXML format: there’s just too much gunk in there ever to convert it cleanly to anything.

PMR: I didn’t regard it as a lambasting but a controlled robust discussion. I understand and appreciate where Glyn is coming from. I haven’t said anything I regret, but I’m also aware that there are unclear boundaries. Before diving in I should get a potential conflict of interest out of the way. We are about to receive funding from Microsoft (for the OREChem project (see post on Chemistry Repositories). This does not buy an artificial silence on commenting on Microsoft’s practice, any more than if I accept a grant from JISC or EPSRC I will refrain from speaking my mind. Nor do I have to love their products. I currently hate Vista. However I need an MS OS on my machine because it makes it easier to use tools such as LiveMeeting (a system for sharing desktops). I’ve used LiveMeeting once and I liked it. OK, Joe did the driving because he knows his way round better than me, but I can learn it. Not everything MS does is bad and not everything it does is good.
The reason I currently like OOXML is that we can make it work and that we have material in Word that we can use. I’ll be demoing it publicly in a week’s time (more later). If we had material in ODT we’d use that, but we don’t. There may be a few synthetics chemists somewhere who use ODT, and we’d really like to hear from them, but currently we have to work with what chemists do.
I’m sorry to hear the translators aren’t good and I’m not surprised. I can’t imagine they are as bad as trying to get structured documents out of PDF. Remember also that we don’t want to do everything at this stage.
Our primary goal is to evangelize the semantic chemical web. To do this we have to create a lot of infrastructure: demonstrators, ontologies, microformats, etc. This is all independent of the tools used to create the starting material. Everything we do will be modular and none of the chemistry will have hardcoded OOXML stuff in. I believe that were we to have a chemical thesis in ODT then we’d be able to adapt our material very easily.
Our motivation, therefore, is to work with scientists as they are, not as we would like them to be. There is no point in trying to make them use Docbook, for example. (Last time I tried I couldn’t even get it to work for me – the stylesheet stackdumped (something I have never seen before)). My worry about Open Office (which emits ODT) is that I don’t yet believe that has reached a state where I could evangelize it without it falling over or being too difficult to install.

The larger question is what needs to be done to convince scientists and others to adopt ODF – or least in a format that can be converted to ODF. I don’t have any easy answers. The best thing, obviously, would be for people to start using OpenOffice.org or similar: is that really too much to ask? After all, the thing’s free, it’s easy to use – what’s not to like? Perhaps we need some concerted campaign within universities to give out free copies of OOo/run short hands-on courses so that people can see this for themselves. Maybe the central problem is that the university world (outside computing, at least) is too addicted to its daily fixes of Windows and Office.

PMR: I face this sort of problem daily in chemistry. Chemists would rather pay for something commercial than become early adopters. I particularly blame the pharma industry. I meet people on an irregular basis who say things like: “Oh yes, we use OSCAR to develop our textmining experience and then we go out and buy X, Y, Z”. Since X, Y and Z are commercial I can’t evaluate the and say they are worse than OSCAR but I firmly believe that for many aspects we are ahead of them. Companies want to buy things where they can sue the suppliers.

We see this with the Blue Obelisk. The pharma companies all use its tools – OpenBabel, Jmol, CDK, possibly JUMBO/CML (but how would I know? I only wrote it and made it Open/Free). Occasionally they write an say “we’d really like to develop Open Source” I write back enthusiastically and then they dump me.

So the strategy is to create something that is self-evidently good and miles ahead of the current software offerings. Then people will have to take notice.

What do we do with Universities? I wish I knew. Universities have an information budget running into zillions (publications, subscriptions, librarians, repositarians, etc.) They are completely incapabale of managing this market. Worth is decided by citations, which are decided by a commercial organisation. Repositories aren’t full, there’s no control and ownership of their output which they simply gift to the publishers. When was the last time a provost spoke out on this? (Yes, I except Harvard, and probably Soton, and QUT and Stirling, but almost all major Universities have failed to tackle this).

So I could spend my time writing letters to Vice-Chancellors.

Or I could develop the next phase of the Open Chemical Semantic Web.

I’ve chosen the latter. But it would certainly help if some readers did the former.

This entry was posted in Uncategorized. Bookmark the permalink.

9 Responses to OOXML and ODT/F; should we work with commercial tools?

Liam McDermott says:

May 11, 2008 at 5:18 pm

Using the Sun Microsystems ODF plugin for MS Office would be a good start. The person you’re responding to is correct in that converters made by Microsoft are a trap, designed to make ODF look bad.
The primary problem you’ll face in using OOXML is it’s propensity to change at the will of Microsoft. You think you’re buying into using an open standard, but it’s still a moving target, with changes made in secret ( http://slashdot.org/article.pl?sid=07/12/06/2129203 ). Also, OOXML due to software patent concerns, ommissions in the documentation, and the ever-changing standard, OOXML is effectively unimplementable by anyone outside Microsoft ( http://www.groklaw.net/articlebasic.php?story=20080417104016186 and http://www.groklaw.net/articlebasic.php?story=20080221184924826 ).
Since Microsoft retain complete control of the file format, and the standard is not a true standard, how can it be trusted as an archival format? What happens if Microsoft goes bust, or the software patents are sold on to a patent troll?
Proprietary file formats, particularly those encumbered by software patents, have a risk associated with using them. So just use standard ODF format. You don’t need to switch outright to OpenOffice or anything, just give that Sun converter a whirl.
OpenOffice is a less than stellar product in my opinion. It’s not unstable, or difficult to install, but does lack features and has a baroque architecture that makes the software slow. The ODF file format is supported by many applications however, and that’s the point: anyone can implement it!

Liam McDermott says:

May 11, 2008 at 5:27 pm

Gosh. Reading my comment above, there are a lot of errors, sorry about that. Also wanted to add a link to this potted history of the OOXML process. This is important because it shows what Microsoft have done to ISO (that you may not be aware of).
It also shows ECMA/ISO’s standing in the IT community: little more than a transport for Microsoft’s formats and protocols. Used to get a tick in the ‘International standard’ box on government procurement forms. Harsh, but true (they’re pushing through a ‘PDF killer’ next).

Egon Willighagen says:

May 11, 2008 at 6:11 pm

Peter, I tend towards supporting the criticism and point out that Open Standards should be the goal. The OOXML is tagged open, but there is more than enough evidence that it isn’t. Word is an excellent product, as are the other MS-Office products. Archival is important, and I think the goal of archival is keeping the annotation of data; a synthetic paper is merely interesting as it annotates the experiments done. A free format like ODF or OOXML is a free-style annotation, and a trade-off between short-term and long-term goals. Both are aimed are making facts reusable, for which databases are more suitable in the end. The short-term goal, however, is to get currently produced data into the system too, and not just that produced in 5-10 years.
Should we make archival the goal (basically, defering data extraction), or go for the ultimate goal and have scientists report (and get rewarded for reporting) facts instead of having them tell stories.
Anyway, the choice seems to state that unlike crystallographers who annotate the data with CIFs, organic chemistry are not qualified enough to use a more formalized way of reporting the facts they found. Do you feel that we, us chemoinformaticians, have failed in the past 10-ish years to convince them that formal reports, like CIF, should be part of the scientific reporting process? That is, should we stop trying to get that reality, and take our losses and go for the ok-ok-we-will-take-anything-you-produce-friendly-organic-chemist?

pm286 says:

May 11, 2008 at 6:35 pm

(1)(2) Thanks Liam. I wrote a reply and WordPress trashed it. Then Egon (3) wrote and I’ll anyway amend what I wrote. I’ll wait a bit to see if there are other comments.

Pingback: SimBioSys Blog » Blog Archive » What is wrong with OOXML
Pingback: How Microsoft Uses Open Against Open | open access | Fair or Unfair
Bruce D'Arcus says:

May 12, 2008 at 2:53 pm

Just one quick point on this:

Our primary goal is to evangelize the semantic chemical web.

You might be interested to know that ODF 1.2 will have a pretty killer metadata system based on RDF and RDFa, and that at least the beginnings of support for this in the form of an API will be coming to OpenOffice 3.0.
It might also be worth nothing that there’s no technical reason that the approach we took to adding this metadata support to ODF couldn’t also work in OOXML.

pm286 says:

May 12, 2008 at 4:21 pm

(7) Many thanks Bruce – this sounds very useful. I expect Peter Sefton can pick this up

Pingback: SimBioSys Blog » Blog Archive » Quality in chemical software - the debate continues