Can OCR page only which is not ocr in pdf file using FR Engine 10

  • 88 Views
  • Last Post 27 September 2017
aravindb posted this 12 September 2017

Hi,  
      Can we OCR which is non ocr pages only in pdf files ?
I mean if we have 10 pages pdf file, if 4,7,9 pages are non ocr and others are ocr'd already, when we pass whole file to FR Engine, it will detect non ocr pages and ocr that files only, other pages are automatically skipped.

OR can we pass page numbers to ocr the pdf file. Is any parameters got ?

Because, if we ocr again and again already ocr'd files, then file quality will decrease and words are became blur.

Note: using vb code.

Regards,
Aravind

Order By: Standard | Newest | Votes
Nikolay Krivchanskiy posted this 18 September 2017

Hi Aravind,

If you know beforehand which pages you need to recognize, you can use FRDocument::Pages method to get access to collection of pages in this document. 

Considering the fact, that by calling FRPage::Recognize method for each page, and not for the document as whole, you can recognize only certain pages. 

Also when you first add file to FRDocument object, via FRDocument::AddImageFile method, as third parameter you pass IntsCollection, which contains numbers of pages you are going to add to FRDocument. 

Please also note that you can save and load both FRDocument and FRPage objects, to load them later.

For more information about FRDocument, please refer to Help → API Reference → Document-Related Objects → Document Organization → FRDocument.

 

aravindb posted this 19 September 2017

Hi
    If we pass 10 pages pdf file into FRDocumnet,  Abbyy will skip already ocr'd pages and ocr the non ocr pages,
I mean if we append non ocr pges to already ocr'd pdf file, then we will send to abbyy, that time it will skip already ocr'd pages or still do ocr for all pages in that pdf file.

Regards,
Aravind

Nikolay Krivchanskiy posted this 26 September 2017

Hi Aravind,

No, ABBYY Cloud SDK will not recognize the fact that you sent pdf that was OCR-ed by it before. Even if .pdf has a textual layer, it is ignored and the page is still OCR-ed. 

 

aravindb posted this 27 September 2017

Hi,  if we know page no to ocr, then we can pass page numbers to ocr in FR Engine ? 

i mean we know number of pages which is not ocr, so can we pass page no to ocr in whole document ?



Regards,
Aravind

Nikolay Krivchanskiy posted this 27 September 2017

Hi,

In FineReader Engine you can do it freely, using FRPage::Process method or FRDocument::ProcessPages, where first parameter is IntsCollection object, which holds numbers of pages you are willing to process.

In C++ code if we are already given a FRDocument object with all pages of the file in question, and we want to process it with standard parameters, it will look like this:

CSafePtr<IIntsCollection> pageIndices;

//Fill pageIndicies with values for pages, we are willing to OCR

frDocument->ProcessPages(pageIndicies, 0);

You can find more information about this method in Help → API Reference → Document-Related Objects → Document Organization → FRDocument.

 

Close