Text is ignored half the time

  • Last Post 10 October 2017
  • Topic Is Solved
prw56 posted this 06 October 2017


I run this code using the EngineLoader used in the sample code:


//create document
FR.FRDocument document = engineLoader.Engine.CreateFRDocument();

//get and add screenshot
System.Drawing.Image screenShot = this.GetScreenShot();

using (MemoryStream m = new MemoryStream())
    screenShot.Save(m, System.Drawing.Imaging.ImageFormat.Png);
    m.Position = 0;
    document.AddImageFileFromStream(new ABBYReadStream(m));

//process and synthesize

//find the text
int posX = 0;
int posY = 0;
for (int x = 0; x < document.Pages.Count; x++)
    FR.LayoutBlocks blocks = document.Pages[x].Layout.Blocks;
    for (int y = 0; y < blocks.Count; y++)
        FR.IBlock block = blocks[y];
        if (block.Type == FR.BlockTypeEnum.BT_Text)
            FR.TextBlock textBlock = block.GetAsTextBlock();
            for (int z = 0; z < textBlock.Text.Paragraphs.Count; z++)
                //need to use the options & regex in UIAutomationHelper
                FR.Paragraph paragraph = textBlock.Text.Paragraphs[z];
                if (paragraph.Text != text)

                //find middle point of text
                posX = paragraph.Left + (paragraph.Right - paragraph.Left) / 2;
                posY = paragraph.Top + (paragraph.Bottom - paragraph.Top) / 2;

The image I add to the document is the screenshot provided, but half the time the parts circled in red are not found in the text after document synthesis takes place. I have also tried using the "Default" engine profile.

Any ideas why these parts of the image are sometimes ignored?

Edit: Also I have verified that the whole image is added to the document by outputting the document afterwards as a pdf, so its not cutting off part of the image.

Order By: Standard | Newest | Votes
Koen de Leijer posted this 07 October 2017

It looks like the same question I've asked recently;

Have you already checked the answer here: https://forum.ocrsdk.com/thread/some-parts-of-a-specific-pdf-are-not-ocr-ed-by-abbyy-finereader-engine/

prw56 posted this 10 October 2017

I had not seen that answer, thank you for pointing it out. Switching to a profile where the parameters mentioned in that thread are set to true (in my case DocumentArchiving_Accuracy) solved the issue.