search for signal words

  • 1.3K Views
  • Last Post 03 September 2014
kofoapp posted this 28 August 2014

hi!

i'm trying to find a way to use the api methods to search an image for certain signal words and get its positions - ideally with regular expressions. this is for extracting relevant data on invoices etc.

example: i want to check if the word "total" appears somewhere within the image and get its coordinates. then i check the image for any occurences of decimal values, get their coordinates and select the one that is closest to "total". any ideas?

of course i could parse the xml output of the processImage output myself in php with regular expressions and use the coordinates of the first and last character for each hit. but this wouldn't work if "total" for example was recognized as "tota 1" or something, so i was thinking there might be a way to tell the ocr directly that it should be looking for "total" and thus make it more likely to return "total" than "tota 1". hope i described my problem understandably, appreciate any thoughts! cheers

Order By: Standard | Newest | Votes
SDK_support posted this 02 September 2014

Hi,

Do you have any sample images where some words were not recognized correctly (as you describe "total" and "tota1")? If yes, please, send us these images to cloudocrsdk@abbyy.com and we'll investigate the issue.

Why you've chosen Cloud OCR SDK among all ABBYY SDK products for your purposes?

kofoapp posted this 03 September 2014

hi! nope, actually don't have any samples and i'm quite happy with the accuracy of the ocr. just thought this might be a common case. are there any api methods though for my use case, or do you recommend parsing and writing my own weighting algorithms to identify and extract relevant data? cheers!

SDK_support posted this 05 September 2014

Thank you for your good reviews of our ocr quality! In this case we could recommend you parsing and writing your own algorithms to extract relevant data. Also we'd like to mention that ABBYY has a special product for extracting relevant data from forms and documents - ABBYY FlexiCapture Engine (http://www.abbyy.com/flexicapture_engine/). You could contact ABBYY office serving your region to get more information.

Close