OCR SDK image to text extraction settings code sample details required

  • 587 Views
  • Last Post 09 August 2016
john_buildinglink posted this 02 August 2016

I am looking at the VisualComponents C# sample (C:\ProgramData\ABBYY\SDK\11\FineReader Engine\Samples\SamplesBrowsingTools\pages\sample_VisualComponents.html).

What does "Reduce ISO Noise" setting do? Is it the same as "Remove Isolated Dots"? If not, do you have "Remove Isolated Dots" setting?

Please let me know if you have any of these settings (if the name differ, please provide detailed specifics)

AutoDeskew, Remove Isolated Dots, Binarize, Brightness, 2D Deskew, 3D Deskew, AutoRotate, Invert, Despeckle 3x3, Despeckle 5x5

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 04 August 2016

The MI_ReduceISONoise command corresponds the RemoveNoise method of the ImageDocument object. When images are taken in low light conditions the camera sensors increase the sensitivity of the image sensor. This increases the so called ISO noise (= red, green, blue pixel-noise). This “digital dirt” then can result in a low quality binary image.

john_buildinglink posted this 04 August 2016

Thanks, but I would like to understand how the Abbyy settings map to settings I am currently using in my OCR products. Please answer my questions specifically. Thanks in advance.

Oksana Serdyuk posted this 09 August 2016

Please sorry for the delay with response.

Most of these settings are supported in ABBYY FineReader Engine 11:

  • 'AutoDeskew' corresponds the CorrectSkew properties of the PrepareImageMode object. By default, the property is enabled.
  • '2D Deskew' and '3D Deskew' – you can choose an appropriate mode using the CorrectSkewMode property of the PrepareImageMode object.
  • 'Remove Isolated Dots' corresponds the RemoveNoise method of the ImageDocument object. It allows you to specify the model of the expected noise: it can be either white noise, or correlated noise.
  • 'Binarize' – the binarization is performed automatically during opening an image. ABBYY technology uses so called "adaptive binarization", you can find more information about it here: Adaptive Binarization and Background Filtering
  • 'Brightness' – the brightness for each image fragment is adjusted automatically during the binarization.
  • 'AutoRotate' corresponds the CorrectOrientation properties of the PagePreprocessingParams object. By default the property is disabled.
  • 'Invert' – is the same as the InvertImage property of the PrepareImageMode object. By default, this property is set to FALSE.
  • 'Despeckle 3x3' and 'Despeckle 5x5' – you can do it using the RemoveGarbage method of the ImageDocument object.

Please see the detailed description of these properties and methods in the Developer's Help file. After you install the program, the file is available in <Installation folder>\ABBYY SDK\11\FineReader Engine\Help.

john_buildinglink posted this 09 August 2016

Are you saying that Binarization happens when I open the image? Binarization is the process of converting a color image to black and white. Am I missing something? Also,"'2D Deskew' and '3D Deskew'–you can choose an appropriate mode using the CorrectSkewMode property of the PrepareImageMode object." "'Despeckle 3x3' and 'Despeckle 5x5'–you can do it using the RemoveGarbage method of the ImageDocument object." Really? Can you be more specific please? Code sample would be nice, for 'each one'? FREngine11.chm does not display content. The help files do not contain code I need.

Oksana Serdyuk posted this 10 August 2016

Please find more detailed answers below:

Binarization

When you open an image in FineReader Engine 11 (for example, using the IFRDocument::AddImageFile method), the image is converted into the so-called “internal format”. The open image file is represented by an object of the ImageDocument type, and this object contains a number of image planes, represented by a respective number of the Image objects. These planes are: full-size black-and-white (a binarized image), gray and color copies of the deskewed image and a small color preview. To see the binarized copy of image you should save it to the file, for example you can do it in C# via the IFRPage::ImageDocument::BlackWhiteImage::WriteToFile() method.

Please read more in the Developer’s Help file -> Guided Tour -> Working with Images.

Image despeckling

Sometimes source images may be very noisy with lots of dots or speckles on them. These speckles, when they appear close to the letters or numbers, may affect the quality of OCR. The size of the speckles to be removed may be specified by the user. For example, you can use the IImageDocument::RemoveGarbage method and specify the maximum area of black dots that are to be considered garbage in pixels as the second GarbageSize parameter.

…
// Add image file to document
document.AddImageFile( imagePath, null, null );

int pagesCount = document.Pages.Count;
for (int i = 0; i < pagesCount; i++)
{
FREngine.FRPage page = document.Pages.Item(i);
page.ImageDocument.RemoveGarbage( null, 3 );
page.ImageDocument.BlackWhiteImage.WriteToFile(Path.Combine(FreConfig.GetSamplesFolder(), @"SampleImages\DespeckledImage.png"), FREngine.ImageFileFormatEnum.IFF_PngBwPng, null, null);
}
…

Image deskewing

When you use auto-deskewing, you can choose from several modes for deskewing images: with pairs of black squares, lines or lines of text. Please see all possible values for the IPrepareImageMode::CorrectSkewMode property in the Developer's Help file.

…
// Correct skew during loading
FREngine.PrepareImageMode pim = engineLoader.Engine.CreatePrepareImageMode();
pim.CorrectSkew = true;
pim.CorrectSkewMode = (int)(FREngine.CorrectSkewModeEnum.CSM_CorrectSkewByHorizontalText |
                    FREngine.CorrectSkewModeEnum.CSM_CorrectSkewByVerticalText);

document.AddImageFile( imagePath, pim, null );
…

Close