FineReaderEngine ignoring text after unkown text/objects

  • 806 Views
  • Last Post 14 July 2016
Florian posted this 06 July 2016

I am using FineReaderEngine to detect text in tables. In some cells there is text which has been crossed out and there is new text which is red. See this example, I created.

alt text

As expected, the engine has trouble to read the crossed out text. Unfortunately, in some rows, it does not recognize the red text in the first column either (third column is recognized in every case).
To fix this, I tried to remove the old text which works fine thanks to the yellow color. I use the ImageModification object and AddPaintRegion() method to paint the rectangle white (I tried black as well). Then I call the processing methods again.
The result is the same: The black Text is recognized, but the red text is just recognized in some cases. In these cases the white area (where the crossed out text has been) is recognized as a tab (a big space) as I can see in the ValidatorForm.

There seems to be no reason why the red text is recognized in some cases and in others not.
Any idea what might cause this problem or how I can avoid it?

I tried to use the AddClipRegion() method hoping the result would be different but I got this error using FRE11: ImageDocument.Modify throws error with SDK 10.0.11 but not SDK 10.0.8

Attached Files

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 07 July 2016

Dear Florian,

Kindly send your image sample to your region Support Team to TechSupport_eu@abbyy.com. They will try to find the correct settings for processing documents.

Florian posted this 12 July 2016

Here are the first ideas I received. Maybe it helps someone.

  • scanning with larger DPI (preferably > 300)
  • improve image using ImageDocument.EnhanceLocalContrast()
  • Check if cells are split incorrectly. Maybe use PageAnalysisParams.SplitOnlyBySeperators = true.
  • using text blocks instead of table blocks

These ideas where not possible for me or did not produce the required result.

Florian posted this 14 July 2016

Another idea was to add text blocks in the same location as the table cells and finally replace the cell text with the text from the added text blocks. (You can add a description to a block which could be 'table x cell y' to memorize to which table cell it belongs.)
This is quite slow but works recognition-wise.

Close