#animalgarden are excited. They are going to Beyond-the-PDF2 #btpdf2 and have been accepted for a demo. That’s a lot of hard work so they are working hard. Last time you saw them (/pmr/2013/03/05/animalgarden-ami2-svg2xml-ignorantchemist-transforming-pdfs-into-xml/ ) they had extracted the characters from PDF and turned them into SVG. Now they are creating HTML – the language of the web.
#ami2 is explaining sub- and super-scripts, italics, colours and a lot more. (AMI got some bling from @kitware on their 15th birthday. She’s not really a bling-animal and it gets in the way of the keyboard). Here’s what she started with:
And here is what she has produced:
#animalgarden and #ignorantchemist think that’s great. It’s captured everything that’s necessary for further interpretation (sub/superscripts, italics) and coloured (red) the characters they had to translate from non-Unicode fonts (mainly MTSYN). Although #scholarlykitchen might condemn the lack of beautiful typesetting it would be possible to improve it automatically using CSS stylesheets (e.g. to create equispaced lines). That’s not the point. It will be possible to extract the mathematics from it. And the data. We can get the rate constant. And turn it into SI units (by dividing by 3600*1000) . ( You can’t do that from PDF.)
That’s real beauty.
(unlike the images from my phone…)