Hi
We are currently using ABBYY FineReader Engine 11.1.14.707470 with Linux with the Java-API (com.abbyy.FREngine.jar).
Almost all PDFs are processed correctly, but when OCR-ing the attached PDF it becomes unreadable
We use the folllowing code to perform the OCR
import com.abbyy.FREngine.Engine;
import com.abbyy.FREngine.FileExportFormatEnum;
import com.abbyy.FREngine.IDocumentProcessingParams;
import com.abbyy.FREngine.IEngine;
import com.abbyy.FREngine.IFRDocument;
import com.abbyy.FREngine.IFRPage;
import com.abbyy.FREngine.IFRPages;
import com.abbyy.FREngine.IPDFExportParams;
import com.abbyy.FREngine.PDFExportScenarioEnum;
public class ABBYY {
public ABBYY() {}
private IEngine engine = null;
public void Run(String inputfilename, String dllFolder, String developerSn, String languages) throws Exception {
// Load ABBYY FineReader Engine
engine = Engine.GetEngineObject(dllFolder, developerSn);
try {
// Setup ABBYY FineReader Engine
String profile = "DocumentConversion_Accuracy";
engine.LoadPredefinedProfile(profile);
// Process PDF
processPDF(inputfilename, languages);
} catch (Exception ex) {
ex.printStackTrace();
} finally {
// Unload ABBYY FineReader Engine
engine = null;
Engine.DeinitializeEngine();
}
}
private void processPDF(String inputfilename, String languages) {
String imagePath = inputfilename;
try {
// Create document
IFRDocument document = engine.CreateFRDocument();
/*
If orientation detection is performed during document processing
(IPagePreprocessingParams::CorrectOrientation property is TRUE), you can select fast
orientation detection mode: set the OrientationDetectionMode property of the
OrientationDetectionParams object to ODM_Fast.
*/
IDocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();
dpp.getPageProcessingParams().getPagePreprocessingParams().setCorrectOrientation(true);
// Agressive text-selection
dpp.getPageProcessingParams().getObjectsExtractionParams().setEnableAggressiveTextExtraction(true);
dpp.getPageProcessingParams().getObjectsExtractionParams().setDetectTextOnPictures(true);
// Set language
dpp.getPageProcessingParams().getRecognizerParams().SetPredefinedTextLanguage(languages);
dpp.getPageProcessingParams().getRecognizerParams().setLanguageDetectionMode(com.abbyy.FREngine.ThreeStatePropertyValueEnum.TSPV_Yes);
try {
// Add image file to document
document.AddImageFile( imagePath, null, null );
// Remove empty pages from inputfile
boolean hasEmptyPages = false;
IFRPages pages = document.getPages();
for (int p = (pages.getCount() - 1); p >= 0; p--) {
IFRPage page = pages.getElement(p);
if (page.IsEmptyEx(null, null, null)) {
pages.DeleteAt(p);
hasEmptyPages = true;
}
}
if (hasEmptyPages) document.Synthesize(null);
// Process document
document.Process(dpp);
// Save results to pdf using 'balanced' scenario
IPDFExportParams pdfParams = engine.CreatePDFExportParams();
pdfParams.setScenario( PDFExportScenarioEnum.PES_Balanced );
/*
Specifies whether a linearized PDF file should be created. Linearized PDF files have internal data
arranged in a page order. A page of a linearized PDF file can be read in a web browser plug-in
without waiting for the whole file to be downloaded. Non-linearized PDFs have the data
necessary to assemble a document page scattered through the whole file. Non-linearized
PDF files are smaller, but they are slower to access.
Note: This property makes sense only for multipage PDF files. If the property is set to TRUE and
a one-page document is exported, a nonlinearized file is created.
This property is FALSE by default.
*/
pdfParams.getPDFFeatures().setEnableLinearization(true);
String pdfExportPath = inputfilename + "_ocrred.pdf";
document.Export( pdfExportPath, FileExportFormatEnum.FEF_PDF, pdfParams );
} finally {
// Close document
document.Close();
}
} catch( Exception ex ) {
ex.printStackTrace();
}
}
}
Which parameters do we need to set in our Java-code to prevent this issue?
Any suggestions within the settings of FREngine itself?
Or is this a known issue in FREngine 11 and to be or already fixed in a more recent version?
Many thanks in advance
Koen de Leijer