We are doing well at reconstructing semantic material from PDFs (#AMI2) but the challenges we are thrown are considerable. Here's today's amusement:
#AMI2 can reconstruct most of this perfectly, but she doesn't know what to do with a hyphenated-subscript. Nor do I, but I'm just an ignorant chemist. The publishing industry tells us that they need our money to produce beautiful easily readable typeset documents. So here's an example of human readability from the same paper:
#AMI2 can read this, but can you? Wouldn't it be easier to typeset it as equations? But that would take up an awful lot of space, and as we know journals have to reduce the space (I never understand why).
I have a plane journey so AMI and I can do some real hacking. We hope to release an alpha version RSN.