I am trying to extract all words from an image using java wrapper with the below settings. I am observing instances where words from two different blocks exist in the same region (The regions for two words coming from different blocks are overlapping).This is causing duplication of data.

i am iterating on each block, paragraph, and getting words from paragraph. Is it possible for two BT_Text blocks to overlap? If yes, how do I avoid this?

IRecognizerParams iRecognizerParams = engine.CreateRecognizerParams();
iRecognizerParams.setSaveCharacterRecognitionVariants(true);
iRecognizerParams.setSaveWordRecognitionVariants(true);
iRecognizerParams.setSaveCharacterRegions(true);
iRecognizerParams.setProhibitHyphenation(true);
iRecognizerParams.setExactConfidenceCalculation(true);pageAnalysisParams.setDetectPictures(false);

IPageAnalysisParams pageAnalysisParams = engine.CreatePageAnalysisParams();
pageAnalysisParams.setEnableTextExtractionMode(true);
pageAnalysisParams.setDetectPictures(false);
pageAnalysisParams.setDetectTables(false);
pageAnalysisParams.setDetectVectorGraphics(false);
pageAnalysisParams.setPaperSizeDetectionMode(PaperSizeDetectionModeEnum.PSDM_CloseToImageSize);
pageAnalysisParams.setProhibitDoublePageMode(true);
pageAnalysisParams.setProhibitDoublePageMode(true);
pageAnalysisParams.setProhibitCJKColumns(true);