[FineReader Engine 10] How to get rid of extra symbols and characters from output?

  • Last Post 19 June 2014
Nitya posted this 11 June 2014

Except from Predefined Text Languages to get only digits in final output I am using following code but output contains alphabets and symbols.

    HRESULT res;
    CSafePtr<IBaseLanguage> baseLanguage;
    res = engine->CreateBaseLanguage(&baseLanguage);

    res = baseLanguage->put_LetterSet(BLLS_Alphabet, CBstr(L"0123456789"));

    CSafePtr<ITextLanguage> textLanguage;
    res = engine->CreateTextLanguage(&textLanguage);

    CSafePtr<IBaseLanguages> baseLanguages;
    res = textLanguage->get_BaseLanguages(&baseLanguages);
    res = baseLanguages->Add(baseLanguage);
    res = baseLanguages->Item(0, &baseLanguage);
res = engine->CreatePageProcessingParams(&pageProcessingParams);
        CSafePtr<IRecognizerParams> recognizerParams;
        res = pageProcessingParams->get_RecognizerParams(&recognizerParams);
        res = recognizerParams->get_TextLanguage(&textLanguage);

SDK_support posted this 19 June 2014

English is the default recognition language. If you want to change the default recognition language, you'd better use the SetPredefinedTextLanguage method of the RecognizerParams object.

In you code snippet you just add your language to a collection of base languages. How to create and set custom language please find in Help → Guided Tour → Advanced Techniques → Working with Languages and also in code samples: CustomLanguage.

But when you set language which contains only digits and there are some letter on the image, FRE will try to recognize letters as some digits.

You could select from your document digits in post-processing.