i'm trying to find a way to use the api methods to search an image for certain signal words and get its positions - ideally with regular expressions. this is for extracting relevant data on invoices etc.
example: i want to check if the word "total" appears somewhere within the image and get its coordinates. then i check the image for any occurences of decimal values, get their coordinates and select the one that is closest to "total". any ideas?
of course i could parse the xml output of the processImage output myself in php with regular expressions and use the coordinates of the first and last character for each hit. but this wouldn't work if "total" for example was recognized as "tota 1" or something, so i was thinking there might be a way to tell the ocr directly that it should be looking for "total" and thus make it more likely to return "total" than "tota 1". hope i described my problem understandably, appreciate any thoughts! cheers
asked 28 Aug '14, 17:39
Thank you for your good reviews of our ocr quality! In this case we could recommend you parsing and writing your own algorithms to extract relevant data. Also we'd like to mention that ABBYY has a special product for extracting relevant data from forms and documents - ABBYY FlexiCapture Engine (http://www.abbyy.com/flexicapture_engine/). You could contact ABBYY office serving your region to get more information.
answered 05 Sep '14, 11:46