I'm getting a strange OCR result with the Cloud SDK. Consider the following XML output:
As you can see, there are two rec variants, '5' and '6'. The '6' has a significantly higher confidence. However, the rec variant that ends up being chosen is '5'. Why is this, and how can it be avoided?
(The correct rec variant is indeed 6 in this case).
asked 23 Jan '13, 01:16
Dear G Moore, the choice between the recognition variants depends not only on the charConfidense, but also on the context.
For example if the word with the “e” hypothesis is not a dictionary word while the word with the “c” hypothesis is a dictionary word, the latter will be selected as the recognition result even if its confidence is little less. In your case, a possible reason may be that in some writing styles "5" is placed lower than the other digits, and maybe in your text the recognized character is placed low. If you are interest in the exact cause of this behavior, please provide your image.
If you need to select the characters from the recognition variants according to charConfidence only, we recommend to export the recognition result to XML and then implement it on your side.
This answer is marked "community wiki".
answered 23 Jan '13, 21:21
Anastasia Ga... ♦♦