Export RasterPictureBlock to image

  • 113 Views
  • Last Post 13 February 2018
Alex K posted this 19 January 2018

Hello,

I need to OCR the image with few pictures (components on the page and they are at different angles - for example, a few paper documents scanned at one time). ABBYY Engine SDK is not able to process all of them correctly. According to my tests and assumptions - ABBYY Engine SDK is only able to correct skew for the first component but not able to do this for the rest of the documents presented in this image.

Now my idea is to divide the OCR process for such images into separate stages:

1. OCR the image without any automatic skew corrections. At this stage, only the information with a proper horizontal orientation should be processed. After this stage - I'll extract all IBlocks and in case of block.getType() == BlockTypeEnum.BT_RasterPicture I'll store these blocks into new image files and process them one more time at the stage #2

2. The new images from stage #1 I'll OCR with ABBYY Engine SDK one more time  with CorrectSkew=true and OrientationDetectionMode=ODM_Thorough

3. Reconstruct the original document with all information received t stage#2 (place OCRed text from images from stage#2 in the place of images appearance in the original image from stage#1)

First of all, please let me know if my approach is correct and I'm moving in the right direction or there is a better way with ABBYY Engine SDK in order to achieve it.

In case my solution looks good, please help me with a few technical questions:

1. Do I need to export rasterPictureBlock (IRasterPictureBlock rasterPictureBlock = block.GetAsRasterPictureBlock();) into the new image file(for example jpg or something similar)  in order to OCR it or it is possible to automatically detect the IRasterPictureBlock rotation angle, correct the skew and OCR it one more time ?

2. Is there a way to configure ABBYY to not try to recognize slightly rotated blocks and return them as IRasterPictureBlock(for OCR stage #2 mentioned above)? Right now, even with CorrectSkew=false ABBYY tries to recognize such blocks and the results are not very good. Right now ABBYY Engine SDK only returns blocks as IRasterPictureBlock when they are rotated at a large angle.

Thanks,

Alex

Anna Borina posted this 13 February 2018

Hi Alex!

Sorry for a long silence.

1. Yes, you will need to export the parts of the image into different files.

2. There is no special option in FRE to tell Engine not to recognize slightly rotated blocks and to recognize the others. However you can use the Page AnalysisParams object to let Engine know that you do not want to detect some types of blocks (for example by setting DetectText to false).

Close