Content Mining: AMI learns her numbers

AMI is learning to extract numbers and paths from images. It’s hard work and requires a lot of heuristics. Here’s her first shot (from BMC Evolutionary Biology DOI:1471-2148-14-20-
1471-2148-14-20-test expanded:
(I’ll select a phylo tree soon). At present we’re just concentrating on the horizontal numbers. (The vertical ones just need turning through PI/2). So we are just after the “6.00”, and “MAD”, etc.
Here’s what AMI gets:
The black is the original pixels (binarized) and the small characters are AMI’s interpretation. She’s essentially got them right (there are a few she doesn’t yet). She thinks the top ones are most likely to be “O”, but “0” and “O” are very similar and we’ll have to use heuristics to decide.
There’s a lot more, of course. It doesn’t worry AMI as she has the emotional apparatus of a FORTRAN compiler. But PMR is feeling we are starting to crack this problem. And next time we’ll do lines. I have to have some of this in place before I visit Ross and Matt on Friday.

