Extract tabular data from pdf image

  • Last Post 31 August 2016
priyanka posted this 30 August 2016

Do you have any function from which we can extract only tabular data from pdf image?

Oksana Serdyuk posted this 31 August 2016

ABBYY Cloud OCR SDK does not support such function, but you can try to implement it yourself. You can process your documents using the processImage method and export the result to the xml format. The recognized text is presented in proper hierarchy: document > page > block > region > etc. The block tag has the blockType attribute, which denotes the type of the block: Text, Table, Picture, Barcode, Separator, SeparatorsBox. So you can extract all text from the blocks with the Table type and form your own table.

Also you can try for your scenario our offline OCR SDK — ABBYY FineReader Engine 11. It is our main SDK which gives you the tools to integrate OCR technologies into your applications. Moreover FineReader Engine provides access to the block content via the Block object before the export stage. So, you can delete text that you do not need to be in the result and leave only table blocks, then you can export the recognition result to any supported export format.

If you would like to try our offline SDK product, please contact your region sales manager or simply fill the following form at our site.