[FR ENGINE 10] Very long text extraction time after image recognition

  • 1.9K Views
  • Last Post 19 November 2013
Alexander Smirnov posted this 14 November 2013

Hi, i need to represent the text from recognized image in a specific manner: Lines which stores Words. Each word stores own rectangle coordinates. I'm using a method ported from previous version of FRE. Could you please give me a hint how can i improve it or achieve similar fucntionality because it takes 20seconds to process the results when image recognition completes in 5 seconds.

All FRE procedures runs in a single STA thread.

        private DocPage ExtractData(FREngine.FRPage page)
    {
        var docPage = new DocPage();
        var layout = page.Layout;
        var cp = engine.CreateCharParams();
        var cp2 = engine.CreateCharParams();

        for (var blocksCounter = 0; blocksCounter < layout.Blocks.Count; blocksCounter++)
        {
            var currentBlock = layout.Blocks[blocksCounter];
            var textblock = currentBlock.GetAsTextBlock();
            if (textblock != null)
                for (var paragraphCounter = 0;
                    paragraphCounter < textblock.Text.Paragraphs.Count; paragraphCounter++)
                {
                    var currentParagraph = textblock.Text.Paragraphs[paragraphCounter];
                    var linesFirstChars = new int[currentParagraph.Lines.Count];
                    var wordsFirstChars = new int[currentParagraph.Words.Count];
                    for (int linesCounter = 0; linesCounter < currentParagraph.Lines.Count; linesCounter++)
                        linesFirstChars[linesCounter] = currentParagraph.Lines[linesCounter].FirstCharIndex;
                    for (int wordsCounter = 0; wordsCounter < currentParagraph.Words.Count; wordsCounter++)
                        wordsFirstChars[wordsCounter] = currentParagraph.Words[wordsCounter].FirstSymbolPosition;
                    DocLine currentLine = null;
                    DocWord currentWord = null;
                    for (int linesCounter = 0, wordsCounter = 0, charCounter = 0;
                        charCounter < currentParagraph.Text.Length; charCounter++)
                    {
                        if (linesFirstChars.Length > linesCounter &&
                            charCounter >= linesFirstChars[linesCounter])
                        {
                            if (currentLine != null)
                            {
                                var rec = currentLine.Words[0].Rectangle;
                                var left = rec.Left;
                                var top = rec.Top;
                                var right = rec.Bottom;
                                var bottom = currentLine.Words[currentLine.Words.Count - 1].Rectangle.Bottom;
                                currentLine.Rectangle = new System.Drawing.Rectangle(left, top, right - left, bottom - top);
                            }
                            linesCounter++;
                            docPage.Lines.Add(new DocLine());
                            currentLine = docPage.Lines.Last();
                        }

                        if (wordsFirstChars.Length > wordsCounter &&
                            charCounter >= wordsFirstChars[wordsCounter])
                        {

                            currentLine.Words.Add(new DocWord());
                            currentWord = currentLine.Words.Last();
                            currentWord.Text = currentParagraph.Words[wordsCounter].Text;
                            currentParagraph.GetCharParams(charCounter, cp);
                            int len = charCounter + currentWord.Text.Length - 1;
                            if (currentParagraph.Length < len) len = currentParagraph.Length - 1;
                            currentParagraph.GetCharParams(len, cp2);
                            currentWord.Rectangle = new System.Drawing.Rectangle(cp.Left, cp.Top, cp2.Right - cp.Left, cp.Bottom - cp.Top);
                            wordsCounter++;
                        }
                    }
                    currentWord = null;
                    currentLine = null;
                    currentParagraph = null;
                }
            currentBlock = null;
        }
        return docPage;
    }

  • Liked by
  • Julia Anikushina
Julia Anikushina posted this 19 November 2013

We have processed our standard Demo.tif with all methods you use. Process slowdown was not reproduced on our side — results processing takes 2 seconds.

Generally, the performance depends on image structure and quality, settings and machine configuration. You can influence performance by selecting appropriate settings. To increase processing speed please refer to the Developer’s Help → Guided Tour → Best Practices → Increasing Processing Speed and to the following article of our Knowledge Base: http://knowledgebase.ocrsdk.com/article/1222.

If these recommendations will not helpful, please send the simple sample project and an AInfo report to SDK_support@abbyy.com in order we could have a better look at the issue and give you appropriate recommendations.

Close