Glyn Moody has taken me to task for espousing Word as a tool that should be considered for archival of scholarly output. Not Word alone, but as a supplement to PDF. I explained my reasons and motivation. Now Glyn has continued the discussion and I reply…
A little while back I gave Peter Murray-Rust a hard time for daring to suggest that OOXML might be acceptable for archiving purposes. Here’s his response to that lambasting:There are two issues here. The second concerns translators between OOXML and ODF. Although in theory that’s a good solution, in practice, it’s not, because the translators don’t work very well. They are essentially a Microsoft fig-leaf so that it can claim using OOXML isn’t a barrier to exporting it elsewhere. They probably won’t ever work very well because of the proprietary nature of the OOXML format: there’s just too much gunk in there ever to convert it cleanly to anything.
My point is that – at present – we have few alternatives. Authors use Word or LaTeX. We can try to change them – and Peter Sefton (and we) are trying to do this with the ICE system. But realistically we aren’t going to change them any time soon. My point was that if the authors deposit Word we can do something with it which we cannot do anything with PDF. It may be horrible, but it’s less horrible than PDF. And it exists.
PMR: I didn’t regard it as a lambasting but a controlled robust discussion. I understand and appreciate where Glyn is coming from. I haven’t said anything I regret, but I’m also aware that there are unclear boundaries. Before diving in I should get a potential conflict of interest out of the way. We are about to receive funding from Microsoft (for the OREChem project (see post on Chemistry Repositories). This does not buy an artificial silence on commenting on Microsoft’s practice, any more than if I accept a grant from JISC or EPSRC I will refrain from speaking my mind. Nor do I have to love their products. I currently hate Vista. However I need an MS OS on my machine because it makes it easier to use tools such as LiveMeeting (a system for sharing desktops). I’ve used LiveMeeting once and I liked it. OK, Joe did the driving because he knows his way round better than me, but I can learn it. Not everything MS does is bad and not everything it does is good.
The reason I currently like OOXML is that we can make it work and that we have material in Word that we can use. I’ll be demoing it publicly in a week’s time (more later). If we had material in ODT we’d use that, but we don’t. There may be a few synthetics chemists somewhere who use ODT, and we’d really like to hear from them, but currently we have to work with what chemists do.
I’m sorry to hear the translators aren’t good and I’m not surprised. I can’t imagine they are as bad as trying to get structured documents out of PDF. Remember also that we don’t want to do everything at this stage.
Our primary goal is to evangelize the semantic chemical web. To do this we have to create a lot of infrastructure: demonstrators, ontologies, microformats, etc. This is all independent of the tools used to create the starting material. Everything we do will be modular and none of the chemistry will have hardcoded OOXML stuff in. I believe that were we to have a chemical thesis in ODT then we’d be able to adapt our material very easily.
Our motivation, therefore, is to work with scientists as they are, not as we would like them to be. There is no point in trying to make them use Docbook, for example. (Last time I tried I couldn’t even get it to work for me – the stylesheet stackdumped (something I have never seen before)). My worry about Open Office (which emits ODT) is that I don’t yet believe that has reached a state where I could evangelize it without it falling over or being too difficult to install.
The larger question is what needs to be done to convince scientists and others to adopt ODF – or least in a format that can be converted to ODF. I don’t have any easy answers. The best thing, obviously, would be for people to start using OpenOffice.org or similar: is that really too much to ask? After all, the thing’s free, it’s easy to use – what’s not to like? Perhaps we need some concerted campaign within universities to give out free copies of OOo/run short hands-on courses so that people can see this for themselves. Maybe the central problem is that the university world (outside computing, at least) is too addicted to its daily fixes of Windows and Office.
PMR: I face this sort of problem daily in chemistry. Chemists would rather pay for something commercial than become early adopters. I particularly blame the pharma industry. I meet people on an irregular basis who say things like: “Oh yes, we use OSCAR to develop our textmining experience and then we go out and buy X, Y, Z”. Since X, Y and Z are commercial I can’t evaluate the and say they are worse than OSCAR but I firmly believe that for many aspects we are ahead of them. Companies want to buy things where they can sue the suppliers.
We see this with the Blue Obelisk. The pharma companies all use its tools – OpenBabel, Jmol, CDK, possibly JUMBO/CML (but how would I know? I only wrote it and made it Open/Free). Occasionally they write an say “we’d really like to develop Open Source” I write back enthusiastically and then they dump me.
So the strategy is to create something that is self-evidently good and miles ahead of the current software offerings. Then people will have to take notice.
What do we do with Universities? I wish I knew. Universities have an information budget running into zillions (publications, subscriptions, librarians, repositarians, etc.) They are completely incapabale of managing this market. Worth is decided by citations, which are decided by a commercial organisation. Repositories aren’t full, there’s no control and ownership of their output which they simply gift to the publishers. When was the last time a provost spoke out on this? (Yes, I except Harvard, and probably Soton, and QUT and Stirling, but almost all major Universities have failed to tackle this).
So I could spend my time writing letters to Vice-Chancellors.
Or I could develop the next phase of the Open Chemical Semantic Web.
I’ve chosen the latter. But it would certainly help if some readers did the former.