I have recently blogged about the standard of “typesetting” in STM publishing and commented that much of it was very poor in that it destroyed the identity of the characters in the document (i.e. many fonts do not use Unicode so machines cannot automatically read the PDFs (correctly)). This has drawn criticism from “The Scholarly Kitchen” blog about me:
Ignorance As Argument — A Chemist Alleges Publishers Exploit Typography for Money
In a recent blog post, Cambridge chemist and crystallographer Peter Murray-Rust argues that publishers are using typesetting and typography to slow down science, extract fees, and control business:
It’s interesting that Murray-Rust explicitly admits ignorance about how typesetting is done at most publishers while also implicitly admitting ignorance about typography in general.
I wrote a comment to the SK challenging their statement and starting to put my position. (Having studied the output of STM publishers on a daily basis for 8 months I am extremely aware of the differences between Helvetica and Arial). However the editor of SK, Kent Anderson deleted my comments and replied in private email. (I had offered to write a guest blog for the SK). I am therefore answering their criticisms on my own blog.
For those not familiar with SK, it’s a group of experts in the STM publishing industry. Some of the contributions are useful and informative. But they are mixed with a group-think politics that questions Open Access, the role of the NIH and PubMedCentral, as is generally self-congratulatory and conservative. [I’ve put that gently]. Recently Anderson has taken an anti-eLife, anti PMC, anti-OA approach with the religious zeal of Senator McCarthy and the sleuthing ability of Woodward and Bernstein. He’s been extracting emails through FOI/A from anyone connected with NIH/PMC etc. [Note: I am on the Project Advisory Board of EuropePMC so I’m not de facto persona grata at SK].
The charge of “ignorance” is meant pejoratively (as, I suspect, is “chemist” – i.e. someone outside the sanctum of STM publishing). But I take it as a compliment. I AM ignorant of everything that goes on in STM backrooms (typesetting, format conversion, graphics, etc.) because it’s highly secret for each publisher. I therefore have to guess, and there is no shame in that. My current hypothesis is that STM backroom technology is inefficient, costly, unsuited to the modern age and could and should be swept away in the same way that Murdoch swept away hot metal in the UK newspaper industry.
But my main concern is that STM publishing corrupts and destroys scientific communication, especially to unsighted humans and machines. The SK has countered by saying that the care they devote to creating typeset PDFs is exactly what readers want. They’ve asked them. [I am not sure what options they gave.]
So my question to the readers of this blog are:
“IN AN ELECTRONIC AGE ARE YOU HAPPY WITH THE PDFs THAT STM PUBLISHERS CREATE?”
For myself I never print PDFs (except boarding passes) and so I’ll give you two examples of what I find seriously problematic:
- Cut-and-paste from PDFs garbles the content if it’s not in Unicode
- Double-column PDF is extremely difficult to read naturally on a modern laptop
I’d like other indications of (dis)satisfaction with PDFs. Please avoid commenting on XML or HTML – the former is hardly ever available from closed STM publishers and the latter has other problems I’ll address later. I’d like to stick to READING – I will address authoring later. Please post comments here – they will all be posted (unlike SK) though there is a delay when I’m asleep. Or you can tweet (@petermurrayrust) if you can express yourself in 140 chars
Yes, on a screen, I always prefer the HTML version over the PDF version. I only carry local PDF versions for quick look-up offline or when using local is quicker than online.
Thanks Björn ,
Useful data point
Hi Peter, thanks for bringing this up yesterday. I did get a couple of interesting responses on Twitter to this yesterday.
First up, NPG appear to hold regular meetings, workshops, surveys discussion sessions etc., regarding what scientists prefer when it comes to the readability of articles. I don’t think the issue is contagious throughout all publishers in terms of the schism of what they provide and what scientists want.
Other scientists made comments along the lines of recognising the standard two column, pretty looking format almost is a signal for their brain for “It’s time to work now!”, and actually is a motivator for research in that sense.
From a personal perspective, I don’t care what an article looks like. I want to be able to extract information and have reusable data (which is arguably not part of the article itself), and for that, it doesn’t matter about the font style, typsetting, prettiness or any other factor apart from am I or am I not able to read it. This is obviously a lot more complicated for researchers who need to be able to text-mine vast numbers of articles, but not for the majority of scientists who don’t use this.
It’s for this reason, why something it makes no difference to my actual doing of science if I’m reading something like this ‘published’ article by Zen Faulkes (http://neurodojo.blogspot.co.uk/2012/09/Ibacus.html), or the ‘properly published’ edition that may have taken months to be made to look pretty. The same information is there, and I can read it. That’s what matters. It’s not that I’m dissatisfied with the quality of pdfs, it’s just I don’t think it’s worth the time, effort and money which goes into transforming something that is already good enough into something that, in my opinion, adds little value to the progression of science.
Many thanks, Jon
I have often commented favourably on NPG – they are one of the very few publishers to have put significant effort into the opportunities of the electronic era (Connotea, Urchin, Precedings, etc.) It would be interesting to know how wide-ranging these discussions are. Are they “given a free range of options what would you like and we’ll see what’s technical/economically feasible” or “would you like your references indented?”
FYI, I recently visited our typesetters in India to understand better exactly what happens in the course of creating PDFs. I’m working my way through a blog post where I hope I can shed some light on the workflow the technology and the innovations that are coming along. I’ll let you know when the post is ready.
Thanks Ian,
I have already commented on the technical aspects of eLife as you know and suggested some areas for improvement. I am happy to discuss more. Question: why have PDFs at all? See @Jon’s comment
Assuming I’m intending to read on an electronic device, I’d much prefer an A5 intended PDF page size to the A4 (or letter?) most publishers produce. Or, say, an ePub (though I accept that these are less controllable in terms of layout). Even the majority of full-sized desktop monitors do not have an A4-tall display area, let alone portable devices.
And, yes, there are the standard (sub|super)script and non-Latin character problems that occur in both the HTML versions and the PDF versions.
Many thanks – useful data points
I would like to suggest multiple PDFs, for print, large screen, say iPad retina, and small screen, say iPad mini. Having multiple PDFs should not be a major extra cost.
Just a note on two points.
First, if you believe that ferreting out favoritism at PMC regarding eLife is anti-OA, then you haven’t really been following this. The first objections to PMC’s practices came to me via OA publishers, who were being shut out while eLife was being given a leg up. It was patently unfair, and that’s a problem. I’ve taken great care, as you note, to document how this all unfolded, which was, to many, worse than they’d imagined. They even colluded to keep PeerJ in the dark about how closely they’d been coordinating things and in response to a complaint from PeerJ. The eLife/PMC story is not anti-OA; it is anti-corruption.
The other is that I used “ignorance” in the proper way, and reflecting your own wording — lack of awareness or knowledge. It wasn’t pejorative, but is a weak basis to argue from. And I described you as a chemist because that’s what you are — again, self-described as such in your “About” section.
Thanks. We now understand each other.
I very much like reading PDF, but not because of the format and purely because of the effort that went into designing the content. That distinction is critical, despite you not wanting me to comment on HTML. There is a shortage of tools aimed decent layout in HTML, even though I think that is a matter of wanting it. But converting LaTeX into PS or PDF (and I think both suffer from similar issues) for readability still is better ™. PDF is just the container, and one can make equally machine-unreadable HTML, aimed at optimizing the display. I’m sure one can do the same aesthetics with a good mix of CSS and single character positioning in PDF. *That* is the real problem. Don’t shoot the messenger, I would say. The story does get more complicated, I know, and I always have the feeling that the bright scientific community is only bright in their own specialism.
So, to refine my message: I like the effort systems put in making text look nice, and find that this only happens using the PDF format.
I love reading on paper. I think it always beats reading on screen. Maybe it is because I don’t have 100 % perfect eye sight. Because of the standard size of papers being A4 (at least in my part of the world) not having multiple columns makes for either a text that is hard to read because of too long rows or for a font size so big that it would be silly. The pdf is made for reading on paper and I find it very unfair to complain about it being difficult to read on a computer screen.
All that being said I suppose that on planet utopia there would be a more semantically correct version (html?) to go with the pdf which also could be read in one column on a computer screen by those with perfect eye sight.
Thanks Jonathan,
I appreciate these comments.
I don’t think it’s unfair to challenge page-based paper as the (often only) mechanism of communication. The rest of the world is changing – publishers have simply transferred the printing bill from them to readers. Print leads to awful mechanisms of scientific communication where (say) chemical spectra are condensed into virtually unreadable abbreviations, albeit beautifully typeset. A single spectrum would be far preferable to everyone, but no – we have to fit on the holy A4.
Of course this is discipline dependent. I’ve just submitted a compsci/math paper (in LaTeX/PDF on a given template) where the authors do all the typesetting. It’s got huge margins on paper but is very readable on a screen. The conference chose the format.