#animalgarden review PeerJ articles – 1

#animalgarden are excited (AMI2, Sleepless and Chuff) are meeting in Melbourne. Chuff the @okfn_okapi has told them that people are interested in biodiversity. Chuff says that’s about animals and plants. PMR tells them that’s there a new journal, PeerJ, which is Open – free as in speech. PMR thinks it’s a Good Thing. He’s asked them to review it and say what they think. This is the first part (Open) – typesetting is the second.

Chuff is the OKF Okapi and is interested in Openness. AMI2 is a kangaroo who can interpret papers to a machine. For content-mining. AMI2 doesn’t understand humans and has no emotions. Chuff will have to explain things.

C: This article has both HTML and PDF versions. The PDF says

C: so I can tell it is Open-Access

A: Can I tell it’s Open Access?

C: Can you read the words?

A: There are no words in a PDF. Only characters.

C: Can you guess the words

A: I can read:

A: The y-coordinates mean they are on a single line. This gives two words “OPEN” “ACCESS”. Is that OK?

C: Possibly. Can you find anything with “CC”?

A: I have found:

A: I have guessed the spaces and this gives the words “Creative” “Commons” “CC-BY” and “3.0”. Is that OK?

C: YES!! That means OKD-compliant!

S: who’s that?

C: That’s Siouxsie! She’s from creativecommons.org.nz. I met here at #kiwifoo

S: Wow! PMR’s with Creative Commons as well.

C: so where’s the monkey?

http://www.zazzle.com.au/kids_peerj_t_shirt-235730159800116640

S: What a friendly monkey! What’s its name?

PMR: I don’t know.

S: We already know lots of Open Animals. Chuff, Gulliver, Tux, GNU, Python. Do they make stuffed blue monkeys?

PMR: don’t know.

S: well they should. We want more friends

C: Who wrote the article?

A: (after working out the characters). Michael P Taylor // dino@miketaylor.org.uk

C: That’s because he loves dinosaurs. He fights for Openness every day.

S: Perhaps we should get a toy dinosaur.

PMR: NOT a Barney, please!

C: So is there anything about Okapis in the article. Okapis are giraffids! Spelt O-K-A-P-I

A: (searches) Yes. (Quotes from article)

Toon A, Toon SB. 2003. Okapis and giraffes. In: Hutchins M, Kleiman D, Geist V, McDade M, eds.

Grzimek’s animal life encyclopedia, Vol 15: Mammals IV. second edition, Michigan: Gale Group,

Farmington Hills, 299–409.

C: Wow! Maybe Dino Taylor has a copy. It’s over 100 pages!

PMR: let’s find out what AMI2 has discovered about the typesetting!

This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to #animalgarden review PeerJ articles – 1

  1. Chris Rusbridge says:

    I asked a Q on twitter about the rights statement (which in the HTML version is buried sideways in Author Information), and shortly got an email back from @InvisibleComma pointing to this page (search on page for License):
    http://www.google.com/webmasters/tools/richsnippets?url=https%3A%2F%2Fpeerj.com%2Farticles%2F36%2F
    This info was apparently there in the PDF as you found but wasn’t machine readable before I asked the question. Pretty darn quick fix!
    Hope this (with link) gets through your spam filter!

  2. Kaveh says:

    Hi Peter
    But if the XML is available, shouldn’t AMI2 first examine that, which is designed to be machine readable, and has all the licensing info explicitly recorded, before trying to reverse engineer the PDF? 😉

    • pm286 says:

      Thanks
      Several reasons why I don’t use the XML
      * Most XML is hidden behind a paywall
      * I am not yet sure that character conversion is done correctly for (say) high codepoint symbols
      * the graphics strokes in the figures are not stored in the XML
      * No author will send you XML
      * No thesis is published in XML
      I have tried to convince people to use XML but conversion from PDF is a necessary interim measure. It’s a sad world.

  3. Kaveh says:

    OK, agreed on those. Of course we want to move to a point where everything is in XML and all publishers publish the XML. Only a handfull now, including PeerJ.

    • pm286 says:

      Thanks.
      I understand that your company processed the Dino article for PeerJ and the only thing that #animalgarden had to address was the use of the Dingabts font. Since Dingbats is one of the standard 14 fonts they were happy to add a translation table for Dingbats2Unicode which actually was quite easy as one had been created by Adobe/Unicode. Unfortunately most font producers don’t do this and we have to guess.
      #animalgarden found the followings fonts for Dino’s article:
      NUHXYD+NimbusSanL-Regu
      QWTDBK+NimbusRomNo9L-Regu
      XNOAWG+Minion-Regular
      YUAKON+Minion-Bold
      ASAUGI+NimbusSanL-Bold
      JYNATA+MinionExp-Bold
      NMVNLH+MinionExp-Regular
      QQGVIA+Minion-Italic
      YFVXKP+MTSYN
      XFSIXK+RMTMI
      MGJGWE+MinionExp-Italic
      OMQUNK+NimbusSanL-BoldItal
      GUJFPJ+Minion-BoldItalic
      RNMPIC+Dingbats
      They are working on getting the Dingbats out

  4. Kaveh says:

    Thanks Kaveh,
    Very useful
    >>Ideally it would be good to have all characters in the PDF as Unicode. The process of going from the author file to the final PDF is a very complex one, and is essentially:
    >>Author file > Structured Latex > XML > Intermediate Latex > PDF (and other deliverables)
    >>The process has taken some 10 years to perfect, and is fully automated.
    I understand and appreciate that this is a not-trivial process
    >> For technical reasons we presently don’t use the unicode characters within the PDF, although we are working on it.
    Understood. When it happens it will be useful to declare in the PDF that all characters are Unicode. At present the main indicator is the Font-family.
    >> Our primary goal with the PDF is to give the reader the best experience visually, while the XML is the definitive content for text mining and archiving and is (in our opinion) the Version of Record.
    I have relatively little problem with this ideal. The problem is that it isn’t reality and is many years from being so. The only Version Of Record in most cases is the PDF – most XML stays in the back-office of publishers. So your opinion is not the opinion of most publishers.
    You are performing a service and from what I can see it’s better than some of the others. But I question whether the service is necessary. You provide “the best experience visually”. I don’t know whether you provide double-column PDF but on my laptop this is a lousy visual experience. HTML is IMO preferable to DCPDF – I’d gladly exchange some kerning for natural scrolling. I doubt whether any publishers do market research on this topic so you are providing what the publishers, not the readers, are asking for.

Leave a Reply

Your email address will not be published. Required fields are marked *