i'm trying to find a way to use the api methods to search an image for certain signal words and get its positions - ideally with regular expressions. this is for extracting relevant data on invoices etc.

example: i want to check if the word "total" appears somewhere within the image and get its coordinates. then i check the image for any occurences of decimal values, get their coordinates and select the one that is closest to "total". any ideas?

of course i could parse the xml output of the processImage output myself in php with regular expressions and use the coordinates of the first and last character for each hit. but this wouldn't work if "total" for example was recognized as "tota 1" or something, so i was thinking there might be a way to tell the ocr directly that it should be looking for "total" and thus make it more likely to return "total" than "tota 1". hope i described my problem understandably, appreciate any thoughts! cheers

asked 28 Aug '14, 17:39

kofoapp's gravatar image



Do you have any sample images where some words were not recognized correctly (as you describe "total" and "tota1")? If yes, please, send us these images to cloudocrsdk@abbyy.com and we'll investigate the issue.

Why you've chosen Cloud OCR SDK among all ABBYY SDK products for your purposes?

(02 Sep '14, 16:05) SDK_support ♦♦

hi! nope, actually don't have any samples and i'm quite happy with the accuracy of the ocr. just thought this might be a common case. are there any api methods though for my use case, or do you recommend parsing and writing my own weighting algorithms to identify and extract relevant data? cheers!

(03 Sep '14, 12:21) kofoapp

Thank you for your good reviews of our ocr quality! In this case we could recommend you parsing and writing your own algorithms to extract relevant data. Also we'd like to mention that ABBYY has a special product for extracting relevant data from forms and documents - ABBYY FlexiCapture Engine (http://www.abbyy.com/flexicapture_engine/). You could contact ABBYY office serving your region to get more information.


answered 05 Sep '14, 11:46

SDK_support's gravatar image

SDK_support ♦♦

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 28 Aug '14, 17:39

Seen: 1,289 times

Last updated: 08 Sep '14, 14:47

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal