No table recognition

  • 291 Views
  • Last Post 22 February 2018
  • Topic Is Solved
KieTo posted this 07 February 2018

Hello,

 

I have a problem with my parameters in the FineReader Engine. The Engine don't detect tables in our files, only the text. But I don'f find the problem in the parameters. We don't use a profile.

 

Our parameters are:

private FREngine.DocumentProcessingParams setDPP() {
            FREngine.DocumentProcessingParams processingParams = engineLoader.Engine.CreateDocumentProcessingParams();

            engineLoader.Engine.MultiProcessingParams.MultiProcessingMode = FREngine.MultiProcessingModeEnum.MPM_Parallel;
            engineLoader.Engine.MultiProcessingParams.RecognitionProcessesCount = Environment.ProcessorCount;

            processingParams.PageProcessingParams.RecognizerParams.LowResolutionMode = true;
            processingParams.PageProcessingParams.RecognizerParams.OneWordPerLine = true;
            processingParams.PageProcessingParams.RecognizerParams.DetectLanguage = false;
            processingParams.PageProcessingParams.RecognizerParams.TextLanguage = fillLangDatabase();

            processingParams.PageProcessingParams.PageAnalysisParams.DetectPictures = false;
            processingParams.PageProcessingParams.PageAnalysisParams.DetectVectorGraphics = false;
            processingParams.PageProcessingParams.PageAnalysisParams.DetectVerticalEuropeanText = true;
            processingParams.PageProcessingParams.PageAnalysisParams.EnableTextExtractionMode = false;
            processingParams.PageProcessingParams.PageAnalysisParams.DetectTables = true;
            processingParams.PageProcessingParams.PageAnalysisParams.AggressiveTableDetection = true;


            processingParams.PageProcessingParams.PagePreprocessingParams.CorrectShadowsAndHighlights = FREngine.ThreeStatePropertyValueEnum.TSPV_Yes;

            processingParams.PageProcessingParams.ObjectsExtractionParams.RemoveGarbage = true;
            //processingParams.PageProcessingParams.ObjectsExtractionParams.EnableAggressiveTextExtraction = true;
            //processingParams.PageProcessingParams.ObjectsExtractionParams.DetectTextOnPictures = true;

            return processingParams;
        }

...

      Document.Process(setDPP());

 

Order By: Standard | Newest | Votes
Koen de Leijer posted this 08 February 2018

Hi

My only suggestion is, that it could have something to do with the OneWordPerLine parameter.
What's the result when adjusting that to false ?

From the help:
OneWordPerLine  VARIANT_BOOL
This property set to TRUE tells ABBYY FineReader Engine to presume that no text line may contain more than one word, so the lines of text will be recognized as a single word. By default this property is FALSE.

Otherwise, try narrowing down by removing each property, which then automatically uses the default.

Best regards
Koen de Leijer


Oksana Serdyuk posted this 09 February 2018

Please also try only to use the DocumentConversion_Accuracy predefined profile without all your settings. Is the table block detected? If not, please send your source image to SDK_Support@abbyy.com for testing on our side. Thank you!

KieTo posted this 15 February 2018

Hello, everyone,

 

thank you for your help, unfortunately that didn't help either. The current status is very curious:

Original PDF File -> Tables detected

PDF to BMP -> no tables are recognized in the BMP

A section of the BMP (with Paint) and the tables are recognized.

Is there a maximum pixel and file size for images?

 

Br

Tobi

Oksana Serdyuk posted this 19 February 2018

This is possible, because during converting from PDF to BMP the resolution of the your document could have changed, as a result you get different recognition results. You can try to change resolution of your BMP file via API: Developer's HelpAPI Reference Image-Related Objects PrepareImageMode → the Resolution overwriting section. If you want to get additional recommendations, kindly send your PDF and BMP files to your region ABBYY Technical Support Team for further investigating the issue.

KieTo posted this 20 February 2018

That's it! Thanks for the solution. 

But is there another way to create a image from the pdf? Or can I increase the resolution? I get a ~100dpi image resolution from a 300dpi pdf

My code:

FREngine.ImageModification imgMod = engineLoader.Engine.CreateImageModification();
FREngine.IHandle hBitmap = ImgDocument.ColorImage.GetBitmap(imgMod);
Bitmap image = Bitmap.FromHbitmap(hBitmap.Handle);

Oksana Serdyuk posted this 22 February 2018

Hi Tobi,

We have successfully downloaded your documents, thank you!

You can convert your PDF file to an image format by using the FREngine API, i.e. use the WriteToFile method of the Image object as shown below in the C# code snippet:

...
// Add image file to document document.AddImageFile( imagePath, null, null ); document.Pages.Item(0).ImageDocument.ColorImage.WriteToFile(imagePath + ".bmp", FREngine.ImageFileFormatEnum.IFF_BmpColorUncompressed, null, null);

You can choose any image format listed among the ImageFileFormatEnum values.

Also it is possible to change the resolution of your the image during image preprocessing stage and before recognition. You can also do it via the FREngine API by using the additional PrepareImageMode settings:

...
// Add image file to document document.AddImageFile( imagePath, null, null );
FREngine.PrepareImageMode pim = engineLoader.Engine.CreatePrepareImageMode(); pim.AutoOverwriteResolution = false; pim.OverwriteResolution = true; pim.XResolutionToOverwrite = 300; pim.YResolutionToOverwrite = 300;

Close