pdf extract table programmatically

  • 339 Views
  • Last Post 07 April 2017
lauberstefan posted this 27 March 2017

Hello,

I am searching for a possibility to extract tables from PDF documents and save them as worksheets in Excel. This should be automated and ideally work with documents of any structure.

PDF transformer seems to be the right choice for this task but it cannot be accessed from another application. Correct?

Is it possible to use flexicapture or finereader engine instead?

Any help much appreciated

Best Regards, Stefan Lauber

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 28 March 2017

ABBYY PDF Transformer is a desktop version and it cannot be used as SDK. However we have the following SDK solutions that possible to use for your usage scenario:

  • ABBYY FineReader Engine (for offline recognition) - this is our SDK which gives you the tools to integrate optical text recognition technologies into your applications. If you are interested to try it, please fill the following form at our site to get the trial version of the program.
  • ABBYY Cloud OCR SDK (for online recognition) provides the same quality of recognition as FineReader Engine as these products are based on the same recognition technologies, but Cloud OCR SDK has limited set of tuning options compared to FineReader Engine. If you interested with Cloud OCR SDK then to help you to get started we recommend to read How to Work with Cloud OCR SDK.

Also you can contact your region sales to clarify some details about our products.

  • Liked by
  • markcheng
lauberstefan posted this 29 March 2017

Hi Oksana, thanks for your support! Is it also possible to use the flexicapture engine for this task? Do you have any sample code for my use case? Regards

Oksana Serdyuk posted this 30 March 2017

 

Yes, it is also possible to use ABBYY FlexiCapture Engine to extract tables from the documents, but your documents should be well-structured and have a known layout with similar structure of the tables, so that you can create a single document definition for them. If this is true for your documents, you can try ABBYY FlexiCapture Engine. To get the trial version of ABBYY FlexiCapture Engine you can fill the following form at our site.

 

Aleksei posted this 30 March 2017

Hi,do I understand that the FC cannot determine the number of rows in the table? He does strictly according to the prepared rule?

Oksana Serdyuk posted this 30 March 2017

Just the number of rows FlexiCapture can determine, but you have to set a rule so that the program can detect/find the table on your document. The questions of adjusting document definitions is better to address to ABBYY Data Capture Community on http://www.capturedocs.com/.

Aleksei posted this 30 March 2017

I mean a little bit more, I need to get the rows with data, but in rule for example was 6 rows in the document to recognize 10. Can all 10 lines with the data to in this case?

Oksana Serdyuk posted this 30 March 2017

Yes, the program should find all rows automatically. First, the program detects the columns, and then it starts to look for rows. It can detect rows automatically by relying on the black separators and white gaps.

There are several methods of dividing a table into rows depending on a type of the table. You can select a method of dividing a table into rows and specify the properties of the rows in the Properties dialog box of the Table element (the Rows tab) in the FlexiLayout Studio while creating your document definition.

Aleksei posted this 06 April 2017

Hi Oksana, thanks for the reply, I learned how to create document definition using FlexiLayout Studio. Could you suggest how to do the same, but using only FCEngine? If it is possible. Where I can read about it?

Oksana Serdyuk posted this 07 April 2017

Please read this information in the FlexiCapture Engine documentation: Developer’s Help → Guided TourAdvanced Techniques Creating Document Definitions.

Close