Hamburgers and Cows; The Cognitive Style of PDF

PDF is one of the greatest disasters in scientific publishing - why?
I normally give my slides in XHTML rather than Powerpoint and prefix them with the quote which I made up:
"Power corrupts; Powerpoint corrupts absolutely"
I then searched the web and found thefts Edward Tufte had already thought of it in
The Cognitive Style of PowerPoint.
Tufte contends that PP had an important role in the Space Shuttle disaster(s). Tufte's premise is that PP requires authors to omit critical data and dumb-down thought. I had never thought of PP as actually perverting they way we think, but it is absolutely right
Mine attack on PP is complementary - technical rather than political. PP corrupts any semantics in the document completely. Just try to read the saved HTML from a PP (in say Google) and you will be lucky to get anything. PP is probably the most effective destroyer of semantic information yet devised. Tufte urges that authors use Word instead. I will interpret this to mean "any tool that displays conventional compound documents at the required level and without loss". I therefore choose XHTML (because Word is a pretty good semantic destroyer as well).So why not just use PDF? It's universal, it's beautiful to look at? It's used for scientific publishing...

NO! PDF is the biggest destroyer of scientific information currently in use.

PDF concentrates on only one thing: reproducing the process of adding printers' ink to paper. The PDF that scientists use for publications was not promoted by them, but by the scientific publishers. How many scientists wrote to the publishers saying "we would like double column text in PDF".

The "e publishing revolution" has had the major and very sad effects of:
* transferring the printing bill from the publisher to the reader (almost all scientists seem to print out the papers and annotate them with markers
* transferring political power to the publishers. It allows the publishers to claim (as the ACS does) that

What is important to realize is that a subscription to an STM journal is no longer [...] a subscription; in fact, it is an access fee to a database maintained by the publisher.

[...] one important consequence of electronic publishing is to shift primary responsibility for maintaining the archive of STM literature from libraries to publishers. I know that publishers like the American Chemical Society are committed to maintaining the archive of material they publish. Maintaining an archive, however, costs money.

From "Socialized Science" (ACS[*] commentary on NIH)
RUDY M. BAUM, Editor-in-Chief, C&E News,
September 20 2004 Volume 82, Number 38 p. 7

How many scientists asked the publishers to convert journals into databases. How many asked the publishers to become the guardians of the archive? And have them switch off access at a moment's notice (as they did to Cambridge last week)

There are some minor benefits from ePublishing, Crossref, more rapid access, but it's a Faustian bargain and we are suffering. PDF has been the devil's agent in this. It has insidiously transferred control to publishers with the unintended but equally horrific downside of semantic destruction.

Apart from the politics, why is PDF so bad? A question on XML-DEV about how to convert PDF to XML brought the lovely comment from Mike Kay (author of the (OpenSource) Saxon XSLT tool):

>
> Could you please tell me, How we can convert the PDF data
> into Xml file using java? I found a library PDFBox.
>

Converting PDF to XML is a bit like converting hamburgers into cows. You may
be best off printing it and then scanning the result through a decent OCR
package.

Michael Kay

http://www.saxonica.com/

http://lists.xml.org/archives/xml-dev/200607/msg00509.html

So I use XHTML and preserve my semantics. It's a labour - but it has to be the way forward. I'll write more on this later and why the browser manufacturers have destroyed semantics as well.

(Judith M-R tells me there were too many typos in last post, so I shall edit offline, spellcheck and paste. I am still losing edits in WordPress and then finding later they have been saved after I have rewritten them.)

This entry was posted in open issues, XML. Bookmark the permalink.

15 Responses to Hamburgers and Cows; The Cognitive Style of PDF

  1. This is a very interesting post from a writing perspective. In business and technical writing, we promote the use of both of these evil forms (PPT and PDF). I will now be able to make a case against their use. I think this information is so crucial that I am going to force all of my students to read this post. Thanks for the information!!

  2. pm286 says:

    Thanks Beth,
    I have come to this via data corruption through PDF and will post more on this. I hadn't thought very much about PDF as having a political side until I wrote this post. Now I shall think about it. As fuel for though I was talking to the ACS publishing officers last night. They use worlds like "database" and "users"; I am old fashioned and use terms such as "journal" and "reader". I shall post on this now...

    P.

  3. Pingback: The Chemically-Aware Web: Are We There Yet?

  4. pm286 says:

    Thank Rich - I have responded positively on your blog

  5. Pingback: Hamburgers and cows; the cognitive style of PDF « Librarian’s place

  6. pm286 says:

    The "hamburger and cow" analogy seems to be older than I thought. I have found the following link in 2003
    http://www.techwr-l.com/techwhirl/archives/0307/techwhirl-0307-00942.html

  7. Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » Hamburger House of Horrors (1)

  8. Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » How do I keep up with the Literature?

  9. Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » Presentation to Open Scholarship 2006

  10. Odonata says:

    very good article

  11. Odonata says:

    maybe, .swf can work intead of both .ppt and .pdf?

  12. Pingback: The Open Access Ecosystem

  13. Pingback: Noel O'Blog

  14. Pingback: Data-round-tripping: moving chemical data around. « Henry Rzepa

  15. Pingback: Unilever Centre for Molecular Informatics, Cambridge - #opencontentmining Starting a community project and introducing #AMI2 « petermr's blog

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>