pdf to excel conversion error

  • 366 Views
  • Last Post 02 September 2019
  • Topic Is Solved
SamAshwinPaul posted this 25 September 2017

  Hi,

I have been using ABBYY's OCR SDK (pdf to excel conversion)for a while testing it with different documents and most of the documents come out perfectly except this document(attached : Written_over_Redacted.pdf). The output (attached : error.png) comes out distorted. Is there anything I can do to improve the quality ?

 

 

Attached Files

Order By: Standard | Newest | Votes
Nikolay Krivchanskiy posted this 27 September 2017

Hi,

The issue is hidden in fact that the table in your file has no vertical separators. Therefore table is not detected and the output gets split line by line. ABBYY Cloud OCR SDK does not provide you ways to change these settings. I can suggest one of the following workarounds:

• Export to .txt. Txt by default retains the layout of file, therefore the output is formatted like original.

• Export to .xml and parse .xml text coordinates manually. ABBYY does not provide .xml parser for data extraction, but you can implement it yourself.

If you are willing to extract data from such tables please consider using FineReader Engine, which will give you access to a wider specter settings. In particular, it will allow you to extract the data from similar files in the form of tables.

SamAshwinPaul posted this 28 September 2017

Hi,

Thanks. I'll try this. Is the FineReader Engine different from the Cloud OCR SDK? When I try to access the Engine I get redirected to the Cloud OCR SDK. Can you please give me the link to it.

Nikolay Krivchanskiy posted this 02 October 2017

The FineReader Engine is a separate SDK, that is not located in cloud. The FineReader Engine allows you to run recognition process on your local machine (or several machines) without sending images to our server. 

Here you can find its description on ABBYY official website: www.abbyy.com/ocr-sdk/.

Bond James posted this 02 September 2019

I will recommend you can also try PDF to Excel Converter to bulk export PDF to XLSX format. It easily saves multiple PDF files into Excel format to save PDF data into XLSX format without any data loss.

Read More Information About This Software:-  http://www.stillbonsoftware.com/pdf-to-excel/

 

Close