Recognizing the first X pages of a document

  • 12 Views
  • Last Post 8 hours ago
  • Topic Is Solved
erion posted this 4 days ago

Dear Abbyy,

 

While using the ProcessPages function of the FREngine API, I have noticed that even though the pageIndices parameter was specified (by adding the number of elements to the appropriate collection), when exporting the document, I was informed that document synthesis needs to be performed on the entire document first. This is also shown in this thread.

 

According to the same thread, I can use the SourceContentReuseMode property in my extractionParameters object. If I am not mistaken, however, this would also include the visible text layer in my export.

 

Thus, I have two questions:

 

1. Is there a way to achieve proper page recognition? Even a few lines of code or a short explanation would be incredibly helpful.

 

2. If I am right, and we do need to process the document twice, how would that affect the number of pages recognized, as we are planning to switch to a CloudSDK license and the recognized pages matter?

 

Thank you very much in advance.

 

 

Order By: Standard | Newest | Votes
Nadezhda A. Solovyeva posted this 3 days ago

The answer to your questions depends on your recognition scenario. Generally, if you would like to process only certain pages of your document, like 2nd and 3rd then please use

            FREngine.FRDocument document = engineLoader.Engine.CreateFRDocument();
               // Add image file to document
                displayMessage( "Loading image..." );
                FREngine.IIntsCollection pageIndicies = engineLoader.Engine.CreateIntsCollection();
                pageIndicies.Add(2);
                pageIndicies.Add(3);
                document.AddImageFile( imagePath, null, pageIndicies );

                // Recognize document
                displayMessage( "Recognizing..." );
                document.Process( null );

With this method, only 2 license counter units would be utilized.

If you have more specific processing scenario, then could you please describe it? 

erion posted this 3 days ago

Thank you very much for your answer.

 

The reason why I would like to specify the pages that need processing in the Process method is because I would like to count the number of pages first, and only recognize a couple from the beginning if the page count reaches a certain threshold.

 

For this, as far as I know, I need to call document.AddImageFile(file, null, null) and specify the page indices in the process method.

 

If I do, however, according to the message I get and the thread I linked to, I need to synthesize all pages, as well as process the ones I need to. I would like to include only the pages I process in my export, but my concern is that the visible text layer, e.g. existing text in a pdf document or in an image, might be also included. Hence my two questions.

 

Nadezhda A. Solovyeva posted this 11 hours ago

You may simply delete the unwanted pages using Pages.DeleteAt(...) after document loading. If you don't use these pages in the subsequent processing, then these pages won't decrease your license counter units.

                document.AddImageFile( imagePath, null, null );
                MessageBox.Show("Document contains " + document.Pages.Count + " pages!");                 document.Pages.DeleteAt(1);

erion posted this 8 hours ago

Thank you very much, this solves my question.

Close