PDF is one of the greatest disasters in scientific publishing – why?
I normally give my slides in XHTML rather than Powerpoint and prefix them with the quote which I made up:
“Power corrupts; Powerpoint corrupts absolutely”
I then searched the web and found thefts Edward Tufte had already thought of it in
The Cognitive Style of PowerPoint.
Tufte contends that PP had an important role in the Space Shuttle disaster(s). Tufte’s premise is that PP requires authors to omit critical data and dumb-down thought. I had never thought of PP as actually perverting they way we think, but it is absolutely right
Mine attack on PP is complementary – technical rather than political. PP corrupts any semantics in the document completely. Just try to read the saved HTML from a PP (in say Google) and you will be lucky to get anything. PP is probably the most effective destroyer of semantic information yet devised. Tufte urges that authors use Word instead. I will interpret this to mean “any tool that displays conventional compound documents at the required level and without loss”. I therefore choose XHTML (because Word is a pretty good semantic destroyer as well).So why not just use PDF? It’s universal, it’s beautiful to look at? It’s used for scientific publishing…
NO! PDF is the biggest destroyer of scientific information currently in use.
PDF concentrates on only one thing: reproducing the process of adding printers’ ink to paper. The PDF that scientists use for publications was not promoted by them, but by the scientific publishers. How many scientists wrote to the publishers saying “we would like double column text in PDF”.
The “e publishing revolution” has had the major and very sad effects of:
* transferring the printing bill from the publisher to the reader (almost all scientists seem to print out the papers and annotate them with markers
* transferring political power to the publishers. It allows the publishers to claim (as the ACS does) that
What is important to realize is that a subscription to an STM journal is no longer […] a subscription; in fact, it is an access fee to a database maintained by the publisher.
[…] one important consequence of electronic publishing is to shift primary responsibility for maintaining the archive of STM literature from libraries to publishers. I know that publishers like the American Chemical Society are committed to maintaining the archive of material they publish. Maintaining an archive, however, costs money.
From “Socialized Science” (ACS[*] commentary on NIH)
RUDY M. BAUM, Editor-in-Chief, C&E News,
September 20 2004 Volume 82, Number 38 p. 7
How many scientists asked the publishers to convert journals into databases. How many asked the publishers to become the guardians of the archive? And have them switch off access at a moment’s notice (as they did to Cambridge last week)
There are some minor benefits from ePublishing, Crossref, more rapid access, but it’s a Faustian bargain and we are suffering. PDF has been the devil’s agent in this. It has insidiously transferred control to publishers with the unintended but equally horrific downside of semantic destruction.
Apart from the politics, why is PDF so bad? A question on XML-DEV about how to convert PDF to XML brought the lovely comment from Mike Kay (author of the (OpenSource) Saxon XSLT tool):
> Could you please tell me, How we can convert the PDF data
> into Xml file using java? I found a library PDFBox.
Converting PDF to XML is a bit like converting hamburgers into cows. You may
be best off printing it and then scanning the result through a decent OCR
So I use XHTML and preserve my semantics. It’s a labour – but it has to be the way forward. I’ll write more on this later and why the browser manufacturers have destroyed semantics as well.
(Judith M-R tells me there were too many typos in last post, so I shall edit offline, spellcheck and paste. I am still losing edits in WordPress and then finding later they have been saved after I have rewritten them.)