Multithreading a single document if the File size is high

  • Last Post 17 June 2016
nayeemkhan posted this 15 June 2016

I have a scenario where I need OCR the file which have more than 150 pages. In order to smoothen the process, is there any method which can break the pages of the single file and process it as multithreading.

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 15 June 2016

In your case we would suggest that you consider using parallel processing. This will allow you to recognize pages in a document in parallel, and thus decrease overall processing time. For detailed description of possible ways to implement parallel processing in FineReader Engine 11, please refer to the Developer’s Help file, the article Guided Tour→Advanced Techniques→Parallel Processing.

You could also take a look at the MultiProcessingRecognition demo tool for an example of parallel processing in FREngine. This demo tool can be found at:

  • %ALLUSERSPROFILE%\Application Data\ABBYY\SDK\10\FineReader Engine\Samples\DemoTools — for Windows XP, Windows Server 2003;
  • %ProgramData%\ABBYY\SDK\10\FineReader Engine\Samples\DemoTools — for Windows Vista, Windows Server 2008, Windows 7, Windows 8, Windows Server 2012.

nayeemkhan posted this 16 June 2016

In our scenario we are processing single file of multiple pages (Around 150 pages) ,is there any way that we can split the pages to the batch of 20 and process the OCR and then merge those pages.

Oksana Serdyuk posted this 17 June 2016

This algorithm, that you are describing will, be performed automatically in case of using multiprocessing. As it is recommended in the Help file, use the FRDocument object for parallel processing of multi-page documents. It is the most easy-to-code multiprocessing way, because you do not have to implement any additional interfaces. Please find the usage details in the 'Processing with FRDocument object' section, the article Guided Tour→Advanced Techniques→Parallel Processing.

Oksana Serdyuk posted this 17 June 2016

Also please note that to use multiprocessing your license must have the number of CPU cores available no less than 2(see the Productivity property → CPU cores).