I teach AMI 255 shades of gray for our revolution. Any dedicated Java Image hackers want to help?

In my last technical post I mentioned that we were trying to recognize the character “A”. Not too difficult for a sighted European human. Hard for Eeyore. Hard for AMI our document reading program.
It’s taken a bit longer than I thought. Here’s our “A”

Simple, isn’t it? It’s one colour – black. What could be simpler?
Well actually it isn’t one colour. it’s 255 shades of gray (I have given up and use US spelling for compsci things). or more correctly it’s a gradation from 0 (black, no light at all) to 255 (as white as you can get). Look closely and you will see “jaggies” which aren’t completely blank. They are there as “antialiasing” – a method to make it look nice for humans (and it works). Remember we aren’t allowed to draw straight lines – we have to use pixels. Many years ago – 1970 – we used pen plotters (Calcomp) or moving spots of light (Tektronix) to draw straight lines. The great Evans and Sutherland computers – for which modern structural biologists should be grateful – produced many of the classic protein structures with straight lines (vectors) during the 1980’s.
But then pixels came back – Silicon Graphics and desktops – and are almost universal. So we have to draw lines and circles with pixels. There are clever algorithms (I still marvel at Bresenham’s circle) but the output is for humans, not machines. A machine simply sees an array (about 30*36 = 1080) of white, grey and black pixels.
No lines.
No A.
We’ve got to reconstruct the A. (AMI asks “Couldn’t the publishers publish proper A’s?”. “No AMI, they can’t, because they want to do the same thing they were doing 20 years ago – simulate paper”). Here’s a bit of it:

Remember “0” is black and “255” is white. You can see the SW corner of the A. I have to teach AMI how to recognise it.
AMI: Are all A’s the same size?
P. No. They can be as small as 7 pixels high (anything smaller is unreadable for humans).
AMI. Are they all the same aspect ratio?
P. No. Some are thin and some are fat.
AMI. Are all the lines the same width?
P. No. There’s Helvetica light (thin), Helvetica (medium) and HelveticaBold (thick)
AMI: Is the A always symmetrical?
P: No, it can be slanted (oblique, italic)
AMI: Are there other fonts besides Helvetica?
P. Zillions. At least ten thousand.
AMI: so you will have to teach me a hundred thousand fonts.
P. I can’t. Some of them are Copyright
AMI: That means you go to jail if you use them and redistribute them?
P. More or less.
AMI. I will not be able to recognise all As.
P. May of them are VERY similar. I will teach you how to recognise similar characters. First we have to convert them to gray.
AMI. I have some convertToGray() modules.
P. Good. Then we have to clip them.
AMI. That’s what you are writing now? Trimming off the white pixel edges.
P. That’s right.
AMI: But some are “nearly white” (240) is that white?
P: we shall set a heuristic cutoff – maybe 240, maybe higher.
AMI: How do you tell?
P. Trial and error. It’s very boring. What is worse is that I am doing some of the things for the first time.
AMI: But you should’nt make mistakes. That’s why we use JUnit and Maven and Jenkins.
P. But I don’t know what methods work best. Maybe Otsu? Maybe Hough? There is no single solution.
AMI. Perhaps you can get some hackers to help you. They might know more about Java Images.
P. Good idea. Java’s ImageIO is not very cuddly – for example a missing file does not throw FileNotFound, but NullPointer.
AMI. ARE THERE ANY JAVA IMAGE HACKERS WHO CAN HELP PM-R? You don’t need to know any chemistry!
P. Thanks AMI. Hackers, please leave a comment on the blog. And there’s a very exciting way of meeting them that I am not yet allowed to blog – a few days.
AMI. Communal knowledge makes projects go faster. I like having several people writing my code. We have good tools for keeping in synch.
P. I took days – an experienced image guru would have done this in a morning.
AMI. So we have clipped the images. Now we have to make them the same size?
P. Yes. And I found a very useful library Imgscalr.
AMI. Yes, you installed and tested it on a few characters . Did it work?
P. Seems to. Now we have to compute correlation… and then see how unique the results are.

This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to I teach AMI 255 shades of gray for our revolution. Any dedicated Java Image hackers want to help?

Guido Sautter says:

March 4, 2014 at 11:47 am

This is a really nice introduction to the basic problems of image enhancement and OCR. From my own experience, I might have a few suggestions:
Regarding the heuristic brightness cutoff (where you use 240 as an example), I’ve found it quite handy to simply use the _average_ brightness of the image. This always works if the image is evenly bright, i.e., the values of “black” and “white” are approximately the same throughout the image.
The NullPointerException you mention regarding Java ImageIO does not occur if the image file to load does not exist, but rather if the file does exist, but ImageIO is unable to decode the image, as documented here:
http://docs.oracle.com/javase/6/docs/api/javax/imageio/ImageIO.html#read(java.io.File)
For image scaling, Java has quite a few onboard facilities, which can also be used for white edge truncation, as well as gray scale conversion. For starters, it’s worthwhile to look at these two classes, especially the details listed below each:
http://docs.oracle.com/javase/6/docs/api/java/awt/image/BufferedImage.html
– TYPE_BYTE_GRAY
– createGraphics()
http://docs.oracle.com/javase/6/docs/api/java/awt/Graphics2D.html
– drawImage()
– scale()
– drawString() (for creating reference images for comparison)

- pm286 says:
  
  March 4, 2014 at 8:45 pm
  
  Really useful! Thanks
  Yes, I have used createGraphics() before (e.g. to create thumbnails). As always it’s finding the right methods and order to call them in
  
Carnë Draug says:

March 4, 2014 at 12:48 pm

Despite the grey borders it should be fairly easy to threshold the image efficiently. I would recommend you try FIJI (it’s part public domain, part FreeBSD) which is Just ImageJ and already comes with several threshold algorithms. The interface allows for easy experimentation.
After opening the image, the first thing you will need is to drop the colour information (the png image you have in this blog post is an RGB image). Do “Edit > Type > 8-bit”.
For the automatic threshold, go to “Image > Adjust > Threshold …”. You will get a long list of available algorithms (if you need them under another license, I have some of them implemented in GPL as well). The Otsu method worked fine.
If you don’t have characters details of 1 pixel width (you probably don’t), then I’d recommend to do a greyscale dilation with the smaller diamond structuring element before the threshold. To do this, before the automatic threshold, do “Process > Morphology > Gray morphology” and select a radius of 1.0 pixels, type of structure element diamond, and dilate operator.
Because seeing is believing, here’s a screenshoot of the results I got with this very simple approach http://picpaste.com/Screenshot_from_2014-03-04_12_36_19-u7R6utd5.png The red part is the final object.
About getting AMI to recognize each letters, you will have to train her. I once saw a public set of handwritten characters for the purpose of training systems. Despite the many different fonts, there should be no novelty in training such a system. Your only problem may be in distinguish between some Latin and Greek characters. I think machine learning will be the way to go to do this. If post office have systems recognizing handwritten postal codes, I’m sure you can have one recognizing machine printed characters.

Thomas Munro says:

March 4, 2014 at 3:13 pm

I’ve found that Adobe Acrobat’s OCR is very slow and inaccurate on grayscale text. I run them through Scan Tailor ( scantailor.sourceforge.net/ ) to get crisp monochrome TIFs, which work much better. Perhaps there’s a java port of the relevant component?

- pm286 says:
  
  March 4, 2014 at 8:53 pm
  
  Thanks. I try to avoid Adobe products but it may be useful as a reference