We need to optionally correct the orientation of the document (typically a simple business card supplied as a BITMAP which we add to the document) and then extract the text from one or more regions. Speed of execution is critical.
I can get it to run fast without document orientation correction.
First analyse each region then do a document recognise.
This works a treat and is fast once I turn off table detection etc. Okay, now for correcting the document orientation. The only way I have found is to do a PreprocessAnalyzeRecognize on the first page (we only have one page).
There are three problems with the above. Firstly it is a bit slow, ideally we need it to be at least twice as fast. Secondly it extracts all text from the document and not just text in our regions as confirmed when I export the document to an external XML file. Thirdly I cannot extract text from the desired regions. For example, when the card has two lines containing "2 Oak Lane" followed by "Fax: 0123456789", both aligned horizontally, and I specify a rectangle that selected the "2 Oak Lane", the ABBYY reader thinks the two lines are one line - "2 Oak Lane Fax: 0123456789" - so I cannot extract the "2 Oak Lane". I know that the line overlaps my region, but I do not know how many characters lie in my region.
I tried modifying the first block of code by first preprocessing the document, with the CorrectOrientation flag set to true, but that did not work.
Incidentally I tried adding code in this message but the firewall blocked me!
Help would be appreciated. Thanks in advance.
asked 21 Oct '16, 13:14