I'm quite new to the OCR SDK and have been experimenting with it for the past few days using a Node.js application. I managed to successfully scan 50 documents and have been reviewing the results.
I decided to throw a more complicated document at it -- so I grabbed 50 jpeg images of flyers on a website and ran them through OCR SDK. The results weren't what I was hoping for, so I wanted to reach out and get feedback on what I can do to make them a lot more accurate.
- Almost all the images I scanned missed the large big block numbers that most coupons have. I'm wondering if it's an issue with typefaces that have strokes on them or in different colours.
- Some of the lower resolution images (minimum 1024x768) didn't have any text at all, they are quite readable so I can't imagine why they wouldn't be accessible.
- Almost all scripted typefaces were ignored.
Here's a sample flyer I scanned which turned up very few results: http://toronto.flyerland.ca/new/flyers/view/96593/ON/0/0/0/0/rexall-pharma-plus/1081/8//1372947590
Are their settings in the API that I can trigger or are there any best practice recommendations I can follow to get the quality up? I realize that OCR is nowhere near perfect, but I'm curious as to what else I can do.