ePUB is a revolution in scholarly publishing. We explore BioMedCentral’s offering.

Scholarly publishing is one of the least enlightened industries of this century. Retail, travel, entertainment, medicine, democracy, and much more have all either been completely changed or show the potential. Yet scholarly publishing idolises the print image (PDF) coupled to a mark of esteem (Journal Impact Factor) as worth paying 3000 USD for 10 pages. It’s absurd to argue that humans were born to read double column PDF on landscape screens and it’s the pinnacle of graphic presentati0on. It’s not. And the idea that readers love the variety of different presentational brands from journals – this is only done for the marketers, not the readers.

Now some publishers are starting to use a new format, ePUB. BioMedCentral have pioneered this (let me know of others) http://blogs.biomedcentral.com/bmcblog/2012/12/11/biomed-central-now-publishes-in-epub-format/ . This deserves much more praise and use than I think it has had, and I’ll explain why.

I, and I suspect many other readers/consumers, hate the way that science is presented by scholarly publishers. My remarks apply to almost all publishers, independently of whether they are “open access” or not. Simply, no publishers care much about their readers. There are three main approaches – some publishers have only one – some have all three.

  • PDF – a sighted-human-only format. Its proponents argue it’s beautiful. But it’s inaccessible to machines and unsighted humans and scholarly publishing produces the worst technical PDF I have come across. It’s taken me a year to build a system to read it semantically and we’re not finished. It cannot adapt to different needs – size of screens, lack of eyes, etc. It has no place in modern science. Its continued use is a severe discouragement to any form of innovation in communication.
  • HTML. This was designed by TimBL to be a simple, powerful, way of communicating science. And it is. I can do almost anything I want with HTML. It interoperates with markup languages, SVG, etc.

    Except when it’s created by most scholarly publishers. Much of their HTML cannot be formally read. Here’s a typical example of total garbage (slightly truncated):

    <meta name=”dc.description” content=”The increase in … with a technique we call ” range=”” modulation”.=”” with=”” modulation,=”” sobp.”=””>

What’s happened is that the original HTML had a quoted section. Something like:

The increase in … with a technique we call “range modulation”. With modulation …

The publisher – or more accurately the contractor they have outsourced to – uses a garbage toolkit. There’s a standard way to treat quotes (&quot;)and it’s been used for 20 years. It’s actually easier to produce conformant HTML than garbage. And there are zillions of free tools. But the crap above crashes all the HTML parsers I have used.


People have paid for this – maybe thousands of dollars. Libraries should cancel/reduce subscriptions. Authors should refuse to pay APCs. If a garage fills your brake fluid reservoir with screenwash they can be prosecuted. But no one cares about the quality of publishing.

  • XML. This is the engineering solution. It’s useful for people who want to re-use the content. Perhaps for content-mining. Perhaps for a different presentational approach. Perhaps for computation (e.g. Maths). Perhaps for data (chemistry). It’s routinely provided by PubMed (EuropePMC). Because NIH have a commitment to accuracy and quality the XML is of good quality. Publishers often re-offer the XML, but much of that is probably limited to Biomedical. (Most publishers use XML internally, but they don’t publish it because they don’t want readers having free access to something useful that they can be charged extra for).


So now we have ePUB. https://en.wikipedia.org/wiki/EPUB . Potentially ePUB takes us a giant step forward. HTML suffers because while it is good at representing semantic and document structure, and can be repurposed, it’s terrible as a container format (you cannot easily distribute images with HTML). Conversely PDF has no semantics and cannot be repurposed, but is a tolerable container format for images.

ePUB supports both of these. It’s a standard format (ZIP) that contains a series of components (HTML, images, XML, etc.). It’s SVG friendly (which in itself can revolutionise science). So I have downloaded a sample from BMC and started to analyse it. If a publisher WANTS the reader to read the content, then it’s a good start. For publishers who do not want people to read their output and (even worse) reuse it, it’s a major threat. (except – salvation – by DRM we can stop anyone reading it). (When we see the first DRM’ed ePUBs we’ll know it’s 1984).

Here’s the example (http://www.biomedcentral.com/content/epub/1471-2105-14-172.epub). [It won’t do anything in your browser unless you have an epub reader, but it’s easy to do this in Mozilla – and I suspect others (see https://addons.mozilla.org/en-US/firefox/addon/epubreader/ and chrome://epubreader/locale/welcome.html ).

  • Go to the webpage where you want to download the ePub file and click on the download link. If you already downloaded the ePub file to your PC, just open it via the “File/File open” menu or drag the file on the Firefox window.
  • Now EPUBReader starts working: it downloads/opens the ePub file, uncompresses it and does some other processing. At the end it will present you the ready to read ePub file immediately!
  • EPUBReader created a page where all the ePub files, you downloaded/opened, are listed. You can open this page at four locations:
    • EPUBReader added a bookmark to this page called “ePub-Catalog” which you can find at the end of your bookmark list.
    • You find a new menuitem in your Firefox “Tools” menu, called “ePub-Catalog”.
    • You can add a button to your Firefox toolbar. If you use Firefox 4.0 or later, the button has been added automatically.
    • When you are reading an ePub-file, you find a button in the bottom toolbar.

If you want to have more information about EPUBReader, please read the manual or the FAQ.

Enjoy it!

Here’s the display. It fits the screen precisely. No vertical scrolling required. Horizontal scrolling, because we’re on a horizontal device. No need for typesetting!!

Isn’t that already a zillion times better than PDF? Yes, so why don’t you demand it?

But that is only 10% of the value of ePUB and I’ll discuss that in a later post.






This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to ePUB is a revolution in scholarly publishing. We explore BioMedCentral’s offering.

  1. Stian Haklev says:

    This is indeed great. I am really looking forwards to new apps supporting a better reading experience for epubs, especially with annotations and note taking, zooming in on images, or even dynamic components etc. So far, epubs have almost only been used for (often DRMed) novels, and the focus has been on just reading. Many epub readers won’t even let you select and copy text. I wish my favorite tools like Skim or PDF Expert on the iPad would support epubs (it would be great to be able to integrate them in my workflow, instead of starting to learn new apps and scripts).
    I’m also curious about how good the various eInk readers are at rendering more complex layouts – I really prefer reading on eInk, but of course PDFs are far too large, and even when converting them, you loose most of the layout.

    • pm286 says:

      I’ll blog my experience using Mozilla. Technically I am optimistic. If tghe publishers don’t take up something here it will be time to create new publishers

  2. Thomas Munro says:

    Hindawi articles have had ePub versions since 2009, but it doesn’t seem to have had any dramatic effect.

  3. Chris Rusbridge says:

    I agree with you that ePub looks like one of the best compromises around: many of the advantages of HTML with good containerisation. Unfortunately there appear to be several different versions of ePub, possibly with incompatibilities. Annoyingly, ePub doesn’t appear to be a native format for most tablet-like book readers. And, since we know publishers COULD make PDFs that are moderately semantically rich, but choose not to, we can perhaps expect pubishers to take on different versions of ePub (and/or the other epublishing formats), and obfuscate them to make sure you (and others) can’t read them easily with your mining programs!
    (You’re not paranoid, Peter, they ARE out to frustrate you… 😉

  4. Louis says:

    I remember Henry Rzepa wrote about his experience with ePubs recently, http://www.ch.imperial.ac.uk/rzepa/blog/?p=11812
    I agree, SVG would be excellent for science. I’m happy to see its use online seems to be on the rise, but right now there are bugs with browser implementation (webkit more than Moz/Opera from what I’ve seen).
    SVG bugs in Chromium are at https://code.google.com/p/chromium/issues/list?can=2&q=svg

  5. Henry Rzepa says:

    I too endorse ePub (version 2, version 3 has yet to make an impact). It is shrink-wrapped HTML (XML), and in that resembles Microsoft’s .docx formats. I have for about 4 years now made epub versions of all my lecture notes available (the conversion was trivial, since I wrote them all in HTML in the first place), and as it happens, they land onto mobile devices such as tablets or phones very well indeed.
    My latest lecture notes are rendered to be interactive using Javascript (I sweated to remove all the Java that they previously had), but the current ePub2 does not honour that. ePub3 is supposed to be able to do this eventually, but progress seems slow, and perhaps has even stalled? If anyone knows better, please tell!
    I might also mention Markdown, which is supposed to be a human-readable mapping of HTML (and most importantly can also be trivially converted to proper HTML/XML). In fact, the Wiki markup carries that ethos already. It is very simple to convert Wiki markup to epub (there is an extension that does this very well). Still, I know of no-one who writes a scholarly article for publication on a Wiki! Perhaps they should (I once did so, using the SemediaWiki, or semantic Wiki, but that was hard work!).

Leave a Reply

Your email address will not be published. Required fields are marked *