[Cloud OCR SDK] different result for the same image

  • Last Post 22 June 2014
Andrew M posted this 09 June 2014

We're finding now that Cloud OCR SDK returning very different results for images we've always used as test images.

For example this image (http://i.imgur.com/6Pa3CTv.jpg) we have used as part of our integration tests used to return a good transcription, now it returns no content after being uploaded.

Other images are working fine, but this one in particular now appears to constantly return broken results.

  • Liked by
  • Dylan-MarkITx
Order By: Standard | Newest | Votes
Anastasia Galimova posted this 21 June 2014

Most probably the issue occurs due to OCR Engine update.

General accuracy on a large number of documents will be higher for the new version - we know this because we’ve tested the improvements on tens of thousands of real documents. However, the OCR result for particular images can differ from the results received using the previous version, because OCR algorithm is not a simple one. It is a very sophisticated artificial intelligence algorithm, and even a small change in it influences the outcome.

For the receipt capture we recommend either to use the special API, which is currently in beta-testing, or the textExtraction profile (if the text order will be wrong, you can sort the words on your side using its coordinates from XML). Unfortunately, the default (documentConversion) profile is not suitable for this task, even if it worked before: it is designed for document recognition.

Andrey Isaev posted this 22 June 2014

It seems very strange, but I cannot reproduce what Andrew is referring to. I just plugged http://i.imgur.com/6Pa3CTv.jpg into http://cloud.ocrsdk.com/Demo with default settings. It took a while to recognize, but result is good. All text is there (except for the logo) and first glance I can't spot any recognition mistakes.