How to specify DocumentProcessingParams when partially OCR

  • 73 Views
  • Last Post 29 March 2019
  • Topic Is Solved
lx11020219 posted this 19 March 2019

Hello!

Is there a way to specify "DocumentProcessingParams" when only partial OCR?

The entire document was OCRed using "document.Process ();"

The purpose is to reduce the time by partially performing OCR because OCR requires a long time for the whole.

 

Code:


// Engine Load

document = engine.CreateFRDocument();
document.AddImageFile(strImagePath, null, null);
document.Pages[0].Layout.Clean();
document.Pages[0].Layout.Blocks.DeleteAll();

// Rect
IRegion region = engine.CreateRegion();
region.AddRect(0, 0, 300, 300);
IBlock newBlock = document.Pages[0].Layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region);
ITextBlock textBlock = newBlock.GetAsTextBlock();
textBlock.RecognizerParams.TextTypes = (int)FREngine.TextTypeEnum.TT_Normal;
textBlock.RecognizerParams.SetPredefinedTextLanguage("JapaneseModern");
//DocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();
//dpp.PageProcessingParams.ObjectsExtractionParams.SourceContentReuseMode = SourceContentReuseModeEnum.CRM_DoNotReuse;
//document.Pages[0].RecognizeBlocks(null, null, dpp.PageProcessingParams.ObjectsExtractionParams);
//document.Pages[0].Synthesize();

// DocumentProcessingParams
DocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();
dpp.PageProcessingParams.PageAnalysisParams.DetectBarcodes = true;
dpp.PageProcessingParams.PageAnalysisParams.AggressiveTableDetection = true;
dpp.PageProcessingParams.PagePreprocessingParams.CorrectOrientation = true;
dpp.PageProcessingParams.PagePreprocessingParams.OrientationDetectionParams.OrientationDetectionMode = OrientationDetectionModeEnum.ODM_Normal;

// Recognition
document.Process(dpp);

// Engine Unload

Order By: Standard | Newest | Votes
Sasha Zendrikova posted this 20 March 2019

Hi,

Document processing in ABBYY FineReader Engine consists of several steps: page preprocessing, analysis, recognition, page synthesis, document synthesis, and export. And the Process method performs all steps of processing except for export for the whole document.

To answer your question, there are several ways to split document processing.
 
Firstly, you can perform only necessary steps for your document using PreprocessAnalyzeRecognize and Synthesize methods of FRDocument object.

Secondly, you can use PreprocessPagesAnalyzePagesRecognizePages and SynthesizePages methods of FRDocument object for specific pages.

And finally, it is possible to work with pages directly via FRPage object and define parameters for an exact page. Pay attention to the PreprocessAnalyzeRecognize method of the FRPage object, that performs all steps of processing except for export for the page.


You can find more additional information about tuning all the necessary processing parameters in Developer’s Help → Guided Tour → Advanced Techniques → Tuning Parameters of Page Preprocessing, Analysis, Recognition, and Synthesis.

  • Liked by
  • lx11020219
lx11020219 posted this 22 March 2019

Hi.

 Thank you for answering.
Calling the "AnalyzePages" method to use "PageAnalysisParams" recognized the entire page.
How can I change the code to recognize only a designated area?

Code:


// PagePreprocessingParams
PagePreprocessingParams ppp = engine.CreatePagePreprocessingParams();
ppp.CorrectOrientation = m_bUseOrientationDetectionMode;
ppp.OrientationDetectionParams.OrientationDetectionMode = OrientationDetectionModeEnum.ODM_Normal;

// ObjectsExtractionParams
ObjectsExtractionParams oep = engine.CreateObjectsExtractionParams();
oep.SourceContentReuseMode = SourceContentReuseModeEnum.CRM_DoNotReuse;

// RecognizerParams
RecognizerParams rp = engine.CreateRecognizerParams();
rp.SetPredefinedTextLanguage(m_strLanguage);
rp.SaveCharacterRecognitionVariants = true;
rp.SaveWordRecognitionVariants = true;

// PageAnalysisParams
PageAnalysisParams pap = engine.CreatePageAnalysisParams();
pap.DetectBarcodes = true;
pap.AggressiveTableDetection = true;

// SynthesisParamsForDocument
SynthesisParamsForDocument spfd = engine.CreateSynthesisParamsForDocument();
spfd.FontSet.SystemFontSet.FontNamesFilter = (int)FontNamesFiltersEnum.FNF_Japanese;

// PreprocessPages
document.PreprocessPages(null, ppp, oep, rp, null);

// AnalyzePages
document.AnalyzePages(null, pap, oep, rp);

// Area
IRegion region = engine.CreateRegion();
region.AddRect(0, 0, 300, 300);
IBlock newBlock = document.Pages[0].Layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region);
ITextBlock textBlock = newBlock.GetAsTextBlock();
textBlock.RecognizerParams = rp;

// RecognizeBlocks
document.Pages[0].RecognizeBlocks(null, null, oep);

// PageSynthesize
document.Pages[0].Synthesize(spfd);

Sasha Zendrikova posted this 28 March 2019

Hi,

Actually, AnalyzePages method doesn't perform recognition, it analyzes and creates a layout for further recognition. If you want to recognize only a designated area and create Layout by yourself, you don't need to use AnalyzePages method at all. 

You can find all necessary steps in Developer’s Help → Guided Tour → Advanced Techniques → Working with Layout and Blocks page in Adding blocks manually section.

  • Liked by
  • lx11020219
lx11020219 posted this 29 March 2019

Hi.

Thank you for answering.
I understand that analysis properties can not be used only in the specified area.
It was very helpful.

Close