I am using FineReaderEngine to detect text in tables. In some cells there is text which has been crossed out and there is new text which is red. See this example, I created.

alt text

As expected, the engine has trouble to read the crossed out text. Unfortunately, in some rows, it does not recognize the red text in the first column either (third column is recognized in every case).
To fix this, I tried to remove the old text which works fine thanks to the yellow color. I use the ImageModification object and AddPaintRegion() method to paint the rectangle white (I tried black as well). Then I call the processing methods again.
The result is the same: The black Text is recognized, but the red text is just recognized in some cases. In these cases the white area (where the crossed out text has been) is recognized as a tab (a big space) as I can see in the ValidatorForm.

There seems to be no reason why the red text is recognized in some cases and in others not.
Any idea what might cause this problem or how I can avoid it?

I tried to use the AddClipRegion() method hoping the result would be different but I got this error using FRE11: ImageDocument.Modify throws error with SDK 10.0.11 but not SDK 10.0.8

asked 06 Jul '16, 15:38

Florian's gravatar image

Florian
153

Dear Florian,

Kindly send your image sample to your region Support Team to TechSupport_eu@abbyy.com. They will try to find the correct settings for processing documents.

(07 Jul '16, 15:40) Oksana Serdyuk ♦♦

Here are the first ideas I received. Maybe it helps someone.

  • scanning with larger DPI (preferably > 300)
  • improve image using ImageDocument.EnhanceLocalContrast()
  • Check if cells are split incorrectly. Maybe use PageAnalysisParams.SplitOnlyBySeperators = true.
  • using text blocks instead of table blocks

These ideas where not possible for me or did not produce the required result.

link

answered 12 Jul '16, 14:27

Florian's gravatar image

Florian
153

Another idea was to add text blocks in the same location as the table cells and finally replace the cell text with the text from the added text blocks. (You can add a description to a block which could be 'table x cell y' to memorize to which table cell it belongs.)
This is quite slow but works recognition-wise.

(14 Jul '16, 16:02) Florian
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×1
×1
×1
×1

Asked: 06 Jul '16, 15:38

Seen: 693 times

Last updated: 14 Jul '16, 16:02

Related questions

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal