com.abbyy.FREngine.EngineException: The PDF file has unsupported format and cannot be opened.

  • 125 Views
  • Last Post 11 July 2019
Koen de Leijer posted this 14 June 2019

Hi

When trying to process a specific PDF with the ABBYY FREngine Java-API we face the following error-message:

com.abbyy.FREngine.EngineException: The PDF file `invoice.pdf` has unsupported format and cannot be opened.

at com.abbyy.FREngine.IFRDocument.AddImageFile(Native Method)

Details about our installation of ABBYY FineReader Engine:
- Debian 8.11 (64-bit)
- Java 1.8.0_201 (64-bit)
- FineReader Engine 11.1.14.707470

Java-snippet that we use to proces the PDFs:

            // Create document
            IFRDocument document = engine.CreateFRDocument();

            /*
                If orientation detection is performed during document processing
                (IPagePreprocessingParams::CorrectOrientation property is TRUE), you can select fast
                orientation detection mode: set the OrientationDetectionMode property of the
                OrientationDetectionParams object to ODM_Fast.
             */
            IDocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();   
            dpp.getPageProcessingParams().getPagePreprocessingParams().setCorrectOrientation(true);
           
            try {
                // Add image file to document
                document.AddImageFile( imagePath, null, null );

                //process full document
                document.Process(dpp);

                // Save results to pdf using 'balanced' scenario
                IPDFExportParams pdfParams = engine.CreatePDFExportParams();
                pdfParams.setScenario( PDFExportScenarioEnum.PES_Balanced );

                String pdfExportPath = inputfilename + "_ocrred.pdf";
                document.Export( pdfExportPath, FileExportFormatEnum.FEF_PDF, pdfParams );
               
            } finally {
                // Close document
                document.Close();
            }

Other PDFs are succesfully processed, this specific one is not.
Any suggestions?

Best regards

Koen de Leijer

Attached Files

Koen de Leijer posted this 11 July 2019

Hi

We've found out that the PDFs that are rejected by ABBYY Finereader have one thing in common,
they all have "PDF Producer" => "Adobe XML Form Library".

According to the Adobe forum, these PDF are XMLs wrapped inside a PDF:https://forums.adobe.com/thread/391837

The need to OCR these PDFs that sometimes, valuable information is within the company-logo or an image in the PDF-footer.

Thanks in advance

Koen de Leijer

 

Close