Content Mining: AMI learns her numbers

AMI is learning to extract numbers and paths from images. It's hard work and requires a lot of heuristics. Here's her first shot (from BMC Evolutionary Biology DOI:1471-2148-14-20-

1471-2148-14-20-test expanded:

junk

(I'll select a phylo tree soon). At present we're just concentrating on the horizontal numbers. (The vertical ones just need turning through PI/2). So we are just after the "6.00", and "MAD", etc.

Here's what AMI gets:

junk1

 

The black is the original pixels (binarized) and the small characters are AMI's interpretation. She's essentially got them right (there are a few she doesn't yet). She thinks the top ones are most likely to be "O", but "0" and "O" are very similar and we'll have to use heuristics to decide.

There's a lot more, of course. It doesn't worry AMI as she has the emotional apparatus of a FORTRAN compiler. But PMR is feeling we are starting to crack this problem. And next time we'll do lines. I have to have some of this in place before I visit Ross and Matt on Friday.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>