Do you have any function from which we can extract only tabular data from pdf image?

asked 30 Aug '16, 15:57

priyanka's gravatar image


ABBYY Cloud OCR SDK does not support such function, but you can try to implement it yourself. You can process your documents using the processImage method and export the result to the xml format. The recognized text is presented in proper hierarchy: document > page > block > region > etc. The block tag has the blockType attribute, which denotes the type of the block: Text, Table, Picture, Barcode, Separator, SeparatorsBox. So you can extract all text from the blocks with the Table type and form your own table.

Also you can try for your scenario our offline OCR SDK — ABBYY FineReader Engine 11. It is our main SDK which gives you the tools to integrate OCR technologies into your applications. Moreover FineReader Engine provides access to the block content via the Block object before the export stage. So, you can delete text that you do not need to be in the result and leave only table blocks, then you can export the recognition result to any supported export format.

If you would like to try our offline SDK product, please contact your region sales manager or simply fill the following form at our site.


answered 31 Aug '16, 13:13

Oksana%20Serdyuk's gravatar image

Oksana Serdyuk ♦♦

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 30 Aug '16, 15:57

Seen: 332 times

Last updated: 31 Aug '16, 13:13

© 2016 ABBYY. All rights Reserved. | Privacy Policy | Legal