com.abbyy.FREngine.EngineException: The PDF file has unsupported format and cannot be opened.

  • 43 Views
  • Last Post 1 weeks ago
Koen de Leijer posted this 14 June 2019

Hi

When trying to process a specific PDF with the ABBYY FREngine Java-API we face the following error-message:

com.abbyy.FREngine.EngineException: The PDF file `invoice.pdf` has unsupported format and cannot be opened.

at com.abbyy.FREngine.IFRDocument.AddImageFile(Native Method)

Details about our installation of ABBYY FineReader Engine:
- Debian 8.11 (64-bit)
- Java 1.8.0_201 (64-bit)
- FineReader Engine 11.1.14.707470

Java-snippet that we use to proces the PDFs:

            // Create document
            IFRDocument document = engine.CreateFRDocument();

            /*
                If orientation detection is performed during document processing
                (IPagePreprocessingParams::CorrectOrientation property is TRUE), you can select fast
                orientation detection mode: set the OrientationDetectionMode property of the
                OrientationDetectionParams object to ODM_Fast.
             */
            IDocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();   
            dpp.getPageProcessingParams().getPagePreprocessingParams().setCorrectOrientation(true);
           
            try {
                // Add image file to document
                document.AddImageFile( imagePath, null, null );

                //process full document
                document.Process(dpp);

                // Save results to pdf using 'balanced' scenario
                IPDFExportParams pdfParams = engine.CreatePDFExportParams();
                pdfParams.setScenario( PDFExportScenarioEnum.PES_Balanced );

                String pdfExportPath = inputfilename + "_ocrred.pdf";
                document.Export( pdfExportPath, FileExportFormatEnum.FEF_PDF, pdfParams );
               
            } finally {
                // Close document
                document.Close();
            }

Other PDFs are succesfully processed, this specific one is not.
Any suggestions?

Best regards

Koen de Leijer

Attached Files

Koen de Leijer posted this 1 weeks ago

Hi

We've found out that the PDFs that are rejected by ABBYY Finereader have one thing in common,
they all have "PDF Producer" => "Adobe XML Form Library".

According to the Adobe forum, these PDF are XMLs wrapped inside a PDF:https://forums.adobe.com/thread/391837

The need to OCR these PDFs that sometimes, valuable information is within the company-logo or an image in the PDF-footer.

Thanks in advance

Koen de Leijer

 

Close