The image of a phonebook page column is here: http://digitalfire.com/culiacan/pictures/309.jpg It was scanned at 600 dpi and resized in Photoshop to 300 (without resampling). We are passing a parameter to read Spanish.
The recognizer is not getting the phone numbers correct on the last 30 or 40 lines (they are chopped off on the right, more digits are missing on numbers nearer the bottom). Also, we are getting a high frequency of errors (for the 60 or so OCRs we have tested so far) where it is reading '-0' as '4)' (7494)211 instead of 749-0211), 'LI' as 'U' and '96' as '%'. This same image reads with much fewer errors using other recognition service. It is also failing to interpret the period-tab as a tab.
Any ideas? Thanks.
Hi there! We've thoroughly examined your case. Your scanned image looks a lot like being taken from camera, so our image preprocessing engine tries to enhance it with some photo preprocessing algorithms which occasionally corrupt a bit of details.
The first thing you need to do is to add
We're also currently working on implementing another option that would increase results even more for your type of images. I'll let you know as soon as it's released to production (that would take several days i think). Please let me know if you have any additional questions.