Field Level Recognition on digit only

  • 122 Views
  • Last Post 03 May 2017
  • Topic Is Solved
thaichat04 posted this 24 April 2017

Hi all,

I'm trying making a snippet code to extract Field Level Recognition on digit data only ([0-9],.), here code in Java with FineReader Engine using regular dictionary and a text language.

to create configuration of engine:

public IDocumentProcessingParams buildProcessingParams(IEngine engine) {
        IDocumentProcessingParams documentProcessingParams = engine.CreateDocumentProcessingParams();
        IPageProcessingParams pageProcessingParams = engine.CreatePageProcessingParams();
        IRecognizerParams recognizerParams = engine.CreateRecognizerParams();
        recognizerParams.setLanguageDetectionMode(ThreeStatePropertyValueEnum.TSPV_No);

        ITextLanguage textLanguage = engine.CreateLanguageDatabase().CreateTextLanguage();
        textLanguage.setInternalName("Digit");
        textLanguage.setLetterSet(TextLanguageLetterSetEnum.TLLS_ProhibitedLetters, "0123456789,.");
        IDictionaryDescription textDictionnary = textLanguage.getProhibitingDictionaries().AddNew(DictionaryTypeEnum.DT_RegularExpression);
        textDictionnary.GetAsRegExpDictionaryDescription().SetText("[0..9],\\.");

        IBaseLanguage baseLanguage = textLanguage.getBaseLanguages().AddNew();
        baseLanguage.setInternalName("Base-Digit");
        baseLanguage.setLetterSet(BaseLanguageLetterSetEnum.BLLS_Alphabet, "0123456789,.");
        baseLanguage.setAllowWordsFromDictionaryOnly(true);
        IDictionaryDescription baseDictionary = baseLanguage.getDictionaryDescriptions().AddNew(DictionaryTypeEnum.DT_RegularExpression);
        baseDictionary.GetAsRegExpDictionaryDescription().SetText("[0..9],\\.");

        

        recognizerParams.setTextLanguage(textLanguage);
        pageProcessingParams.setRecognizerParams(recognizerParams);
        documentProcessingParams.setPageProcessingParams(pageProcessingParams);
        return documentProcessingParams;
    }

 

how to use in process:

IDocumentProcessingParams documentProcessingParams = setup.buildProcessingParams(engine);
document.Process(documentProcessingParams);

 

Unfortunately, this code give nothing in export. 

Do you have any idea ?

 

Order By: Standard | Newest | Votes
IvanPopov posted this 03 May 2017

Apparently, in your custom language you prohibit the same characters and words that you would like to recognize. Basically, the following lines

textLanguage.setLetterSet(TextLanguageLetterSetEnum.TLLS_ProhibitedLetters, "0123456789,.");
IDictionaryDescription textDictionnary = textLanguage.getProhibitingDictionaries().AddNew
   (DictionaryTypeEnum.DT_RegularExpression);
textDictionnary.GetAsRegExpDictionaryDescription().SetText("[0..9],\\.");

tell FREngine that there should not be symbols "0123456789,." in the recognized results, i.e. they are prohibited. Please refer to the Developer's Help articles API Reference → Language-Related Objects → TextLanguage and API Reference → Enumerations → TextLanguageLetterSetEnum for additional details.

Once you remove these lines from your code, you should be able to get the results that you expect.

  • Liked by
  • thaichat04
thaichat04 posted this 03 May 2017

Thank alot your answer. In mean time, I found an other solution in your doc. Abbyy has already a particular built-in language 'Digits' can fit this case. Here's the code:

@Override   public IDocumentProcessingParams buildProcessingParams(IEngine engine) {   IDocumentProcessingParams documentProcessingParams = engine.CreateDocumentProcessingParams();   IPageProcessingParams pageProcessingParams = engine.CreatePageProcessingParams();   IRecognizerParams recognizerParams = engine.CreateRecognizerParams();   recognizerParams.SetPredefinedTextLanguage("Digits");   recognizerParams.setLanguageDetectionMode(ThreeStatePropertyValueEnum.TSPV_Yes);   pageProcessingParams.setRecognizerParams(recognizerParams);   documentProcessingParams.setPageProcessingParams(pageProcessingParams);   return documentProcessingParams;   }

Close