03 April 2017
- Last edited 03 April 2017
Thank, you, but Czech alphabet normally contains characters like "Á", I cannot restrict it.
In this case, the scanned image DOES NOT contain it, but is auto corrected using dictionary to version with "Á".
If I use English language, and set allowed characters to contain "Á", the word is still OCR correctly as "TRKA" and other words which really have "Á" work correctly too.
So the BUG is obviously in too eager dictionary checks which convert word "TRKA" to "TRKÁ" for no reason at all
The field input: http://i.imgur.com/FsyeaRv.png
OCR with Czech language, result: "TRKÁ" (fail) "Á" confidence 100! even if its not there.
OCR with English language, with "Á" in alphabet, result: "TRKA" (correct)
The field input: http://i.imgur.com/4cSSlvs.png
OCR with Czech language, result: "MALEGOVÁ" (correct)
OCR with English language, with "Á" in alphabet, result: "MALEGOVÁ" (correct)
So it is obvious some dictionary based processing transforms correctly OCR word to something else. "TRKA" is a surname (not in dictionary) while "TRKÁ" is verb that is likely to be in a dictionary.