Coulmn data is merged in abbyy 10.5

  • 81 Views
  • Last Post 11 May 2017
Anurag Singh posted this 12 April 2017

A table which is being read as a text data using Abbyy engine. item (say 1001) and material code (20015689) are two coulmns. when ABBYY extracts data it merges both value like below -

 

<cell colSpan="2" leftBorder="White" rightBorder="White" bottomBorder="White" width="91" height="19">

<text>

<par>

<line baseline="328" l="38" t="321" r="112" b="329"><formatting lang="EnglishUnitedKingdom">1001  20015689</formatting></line></par>

</text></cell>

I have seen some suggestions on INTERNET (http://knowledgebase.abbyy.com/article/698 ) which I tried to apply but unable to do that.

I am using java to call ABBYY OCR.

 

Can you please help me in this?

Thanks a Lot.

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 13 April 2017

Hi,

please try to enable the AggressiveTableDetection property of the PageAnalysisParams object. If you set it to TRUE, FineReader Engine tries to find as many tables as possible on the page.

If this setting does not help, please send:

  • your serial number
  • the build number of FineReader Engine 10.5
  • the image illustrating the issue

 to your region Technical Support. All ABBYY contacts you can find here: https://www.abbyy.com/contacts/

  • Liked by
  • Anurag Singh
Anurag Singh posted this 20 April 2017

Hi Oksana,

Thanks for your reply.

I will have have to take permission before providing you sample and license  from my client. Once I get that I will send you the sample and serial number.

I have used AggressiveTableDetection property but it is also not working, because table has no vertical lines for separating cells. if two cells are having data very near then ABBYY is merging them together.

Do you have any suggestions for this.

Just to let you know

when I used visual component and vertical lines for separating table cells then it is perfectly extracting all data.

I have one question here,

1- Are ABBYY visual components and ABBYY engine both used together for good extraction?

2-Do companies use both of them together for extraction, like first pre-process the image using Visual Component and then pass it to ABBYY engine or we can achieve same result by only using ABBYY engine programmatically?

 

Thanks a lot.

Regards,

Anurag

 

 

 

 

Anna Fedyushkina posted this 11 May 2017

Hi,

1. You may use visual components to get better recognition results, for example to get your table extracted. However there are also other purposes for the visual components: viewing the list of document pages, editing images, editing or validating recognized text. Please check more information about visual components in Help → Visual Components Reference.

2. As each compony has its own scenario, the methods they use to recognize documents are usually very different.

You may achieve the same results as with visual components by using only ABBYY engine programmatically by adding a table block to the layout. However you will need to know the exact region of the table on your page. Please find more information about this method in Help → Guided Tour → Advanced Techniques → Working with Layout and Blocks.

Close