From HUBLOG (who I think is Alf Eaton from Nature) we have:
EPUB and Adobe Digital EditionsBill McCoy from Adobe came in to give us a talk yesterday, the main part of which was a demo of Adobe Digital Editions and an overview of the document packaging standard EPUB (aka IDPF OPS/OCF).
Digital Editions is alpha-quality software that I really wish was written in XUL and used the Gecko rendering engine and was extensible using XPI plugins, but isn’t (as far as I know). It’s going to be cross-platform, with a Linux version planned for later this year, which is good. It’s also full of quite cryptic error messages and likes to hang, particularly when XML documents aren’t completely valid. Its main use, though, is as a demonstration of EPUB, which is wonderful.
An EPUB file is something I’ve been looking for for ages: a zip file containing…
- A
mimetype
file that describes what kind of package this is (so it’s not dependent on the file extension), and aMETA-INF/container.xml
file, that describes what kind of file is used to describe the main content, and where to find that root description file.- An
OEBPS
folder that contains the main content, which is comprised of
- A .opf file, which provides metadata for the document, a manifest of all the associated files, and an ordered list of the documents to display.
- A .ncx file, which describes the table of contents.
- .html and .css files, which are XHTML and CSS files the same as you’d use for the web and comprise the actual documents and associated stylesheets for rendering.
As is Adobe’s way with these things, there appears at first glance to be slightly too many, possibly redundant metadata files. If that’s necessary to support a wide variety of packaging and document formats though, then that’s fine – I’m glad that they’ve chosen to use open standards so that anyone can easily create and read these packages.
What Digital Editions does very nicely with these files, which web browsers don’t – but I suspect could, with a bit of Javascript – is to reflow content into multiple, paged columns depending on the screen width and font size. This makes it much easier to read documents on the screen as there’s less scrolling, and unlike PDFs you don’t have to scroll up and down to read each page. (Also unlike PDFs, it’s easy to extract the content from these XHTML files, so no more hamburgers. [PMR: my re-use of the metaphor that PDF->XML is O(hamburger->cow)] Adobe will support DRM in Digital Ediitons where necessary, so I imagine there may be some way to encrypt the content of documents, but Bill at least seems to have a good stance on not pushing DRM where it’s inappropriate).
However Digital Editions ends up, I really hope it lets you apply user stylesheets and user scripts to documents, otherwise we’re still better off viewing them in a web browser (which may well get native, or at least plugin support for EPUB before too long).
PMR: The Nature folks are among the most advanced techie people in science publishing so their views are worth listening to. And if HUBLOG isn’t them, then I apologize. But I’m not excited by this news
I see this from a perspective of 15 years of the broken browser, a history of bloated systems, manufacturer non-compliance, abandoned systems, and the failure of the W3C to carry any influence. Examples:
SVG. An open standard for graphics. We’ve spent a lot of time developing SVG for CML. We used to be able to do some really fun things that advanced chemistry. Adobe produced a lovely plugin (admittedly only for IE, but given the browser midden what else could be expected). OK – I had to put on my pages “only works with IE”. That’s one reason why I use Windows on my laptop – it’s the easiest way of getting SVG going. So where are Adobe now in SVG? They’ve abandoned the plugin;
Compound documents. The W3C has had an activity for years on how we package HTML files over the web. Where has it got to? So here we have a manufacturer offering a proprietary solution to compound documents.
So my problem is that Web technology is now dominated by large organisations who think in terms of large teams of programmers – at XTech I was interested in XForms in the browser – and got a reply that it was now quite practicable: “we cut our development team down from 10 programmers over 5 years to 5 programmers over 2 years”. I may have got the numbers wrong but the scale is roughly right. And for developing a client-side molecular browser…?
” is to reflow content into multiple, paged columns depending on the screen width and font size”. Is this what a molecular biologist wants when reading a sequence? I don’t know of one scientist who asked a publisher for multiple column PDF files. Yet this is what is forced on us. And here we have a product with built-in DRM and encryption. Is the scientific community crying out for DRM (== digital rights management, technology to stop you reading things).
So, if anyone out there is listening, I want a browser that supports SVG out-of-the box. I want open standards, not proprietary tools which embrace, extend and drive-me-up-the-wall-when-they-disappear. I want an OPEN compound document format. Like SWORD. Like ORE.
Oh – and have I mentioned that it would be nice to send chemistry over the web in XML? Not gifs, not PDFs, not whatever-the-next-glitzy-proprietary is. Just CML.
many more people want that. developments are slow
Yes, that’s me (though HubLog’s nothing to do with Nature). To answer some of your comments:
SVG: SVG is supported natively by Firefox 2, Safari 3 and Opera 9 (which is I think one of the main reasons Adobe dropped the plugin, which they would’ve had to update to support Vista and other devices). There’s no reason why an EPUB client shouldn’t render SVG.
Compound documents: Digital Editions requires that the XHTML documents are valid XML, so there’s no reason why an EPUB client shouldn’t render compound documents. Unlike EPUB files, I don’t see compound documents as a solution to the ‘I want to save this article on my laptop, along with all the supplementary data, in one file’ problem. Note that there’s also nothing proprietary about EPUB; it’s an open standard produced by the IDPF.
You should try the column reflowing in Digital Editions and Microsoft Reader – it’s an excellent way to present long passages of text that makes it much easier to read without scrolling, on a range of screen sizes. It’s explicitly the opposite of PDF, which is only good for print.
The scientific community probably doesn’t need DRM, and publishers don’t have to use it. The fact that the client software happens to support it is irrelevant to scientists, but may be useful for some book publishers.
As for your last two paragraphs, I think that EPUB (with XHTML) is the answer to all those things.
(I meant Microsoft’s New York Times Reader, not Microsoft Reader, sorry).
(2) Many thanks Alf,
I’m delighted to hear that EPUB is open. So this sounds a lot better. If the message is “there is a good open compound document standard and it just happens that a major manufacturer is supporting it” then great.
Unlike EPUB files, I don’t see compound documents as a solution to the ‘I want to save this article on my laptop, along with all the supplementary data, in one file’ problem.
Agreed. There are too many people trying to bung everything into a single PDF file.
What the SWORD (JISC) people are doing is IMO an important way forward for academia