I have documents that tend to have two tables but no whitespace between them. It could also be viewed as one table without uniform columns for every row. The FineReader SDK gets confused since it tries to treat this as a table, and when I try to extract the data I can't tell where one row ends and the next begins. For example:
The first two rows are divided into 7 columns. The second set of rows are divided into 8 columns. It appears as if the SDK is trying to treat it as one large table, adding invisible separators for the areas where the columns don't extend. Like this:
Which is obvious why the engine would get confused over this. If I split the tables visually using photoshop they get parsed perfectly. Any tips on how to handle this situation? I could hardcode the number of columns per document type, but that seems messy and I'd like to keep it more generic.
asked 22 Nov '16, 22:04
You are absolutely right when you say that FineReader Engine creates 14 separators as you have drawn on the second image. To handle this situation please note that Type property of separators that cross through the merged cells is TST_Absent. Separator type is not an attribute of the whole separator but of a single separator segment between the adjacent intersections with perpendicular separators.
If you need to get number of separators to understand how many columns are there in the row you can use the code sample below:
answered 02 Dec '16, 17:29
Anna Fedyush... ♦♦