Experiments with processImage

  • Last Post 22 July 2013
reds posted this 04 July 2013

Hi there,

I'm quite new to the OCR SDK and have been experimenting with it for the past few days using a Node.js application. I managed to successfully scan 50 documents and have been reviewing the results.

I decided to throw a more complicated document at it -- so I grabbed 50 jpeg images of flyers on a website and ran them through OCR SDK. The results weren't what I was hoping for, so I wanted to reach out and get feedback on what I can do to make them a lot more accurate.

  1. Almost all the images I scanned missed the large big block numbers that most coupons have. I'm wondering if it's an issue with typefaces that have strokes on them or in different colours.
  2. Some of the lower resolution images (minimum 1024x768) didn't have any text at all, they are quite readable so I can't imagine why they wouldn't be accessible.
  3. Almost all scripted typefaces were ignored.

Here's a sample flyer I scanned which turned up very few results: http://toronto.flyerland.ca/new/flyers/view/96593/ON/0/0/0/0/rexall-pharma-plus/1081/8//1372947590

Are their settings in the API that I can trigger or are there any best practice recommendations I can follow to get the quality up? I realize that OCR is nowhere near perfect, but I'm curious as to what else I can do.



SDK_support posted this 22 July 2013

Hi Dave!

We have processed the sample flyers from mentionned web-site. As we could see there is a text with different font size and text color is inverted in several text blocks. To get the correct recognition results for such images it is necessary to tune the resolution.

In rder we could help you and investigate this issue with more details please send to CloudOcrSdk@abbyy.com the following information:

1) the setting with which you process the sample flyers;

2) the description of how should the results look like.

Thank you.

Best regards,