prw56 posted this 06 December 2017

I'm trying to specify the font that is used in my document to increase extracted text accuracy. I assumed I could just set the FontNamesCustomFilter property, but I'm not seeing any changes in the extracted text.

procParams.SynthesisParamsForDocument.FontSet.SystemFontSet.FontNamesCustomFilter = "Arial, Microsoft Sans Serif, Segoe UI";

Am I doing this right?

In case it matters, the particular problem with recognition I am currently having is "1" being mistaken for "l" (lower case L). I want to know how to specify the font either way, but if there is another more relevant setting for fixing this particular issue, please let me know. Thanks for any help!

There is no possibility to choose font during recognition. When ABBYY FineReader Engine starts to use the SynthesisParamsForDocument.FontSet.SystemFontSet object all text is already recognized and the mistake was already made.

To improve recognition quality you should better to tune the PagePreprocessingParams, ObjectsExtractionParams and RecognizerParams. To read more about them, please read Help → Guided Tour → Advanced Techniques → Tuning Parameters of Page Preprocessing, Analysis, Recognition, and Synthesis.

Another option is to train FineReader Engine to recognize your font. To learn how to do it you can visit Help → Guided Tour → Advanced Techniques → Using GUI elements → Training User Patterns.

Also if your document is a searchable pdf, you can reuse the text layer it contains. Simply set the ObjectsExtractionParams::SourceContentReuseMode to true.


