Difference between serifProbability and charConfidence

G Moore posted this 01 April 2013

Just wondering what the precise difference between the serifProbability and charConfidence parameters in XML output are and what exctly each signify?


Anastasia Galimova posted this 02 April 2013

Hello G Moore,

Thank you for your question! Please see the detailed description below.

  • CharConfidence (integer)

Stores the value of character confidence. It is in the range from 0 to 100, and -1 corresponds to the fact that confidence is undefined. It represents an estimate of recognition confidence of a character in percentage points. The greater its value, the greater the confidence. The characters extracted from the source PDF file without recognition have the character confidence equal to 100.

  • SerifProbability (integer)

The value of this property specifies probability that a character is written with a Serif font. It is in the range from 0 to 100, and 255 corresponds to the fact that this probability is undefined.

G Moore posted this 09 April 2013

Okay, that's what I thought. We're seeing an interesting pattern emerging whereby, in the case that a character was incorrectly identified but the correct character is among the charRecVariants, the serif probability for the character that ended up being used always seems to be lower than the serif probability for the charRecVariant that is correct and should have been used, whereas the charConfidences are the opposite (i.e. the incorrect character has higher confidence, hence why it was used I guess). Would there be any reason for this? It's seems to happen too regularly to be a coincidence.

Anastasia Galimova posted this 11 April 2013

Could you please share the images you recognize and the settings you use? You can send it to CloudOCRSDK@abbyy.com.