Best approach for Receipt field scanning

  • Last Post 12 November 2014
Jorge Velasco posted this 06 November 2014

I need to work with a 'receipt fields scanning' scenario. I was wondering which is the best approach for it:

1.- Use 'processTextField' method adding regExp param with the name of my fields to recognize. 2.- Use 'processImage' method doing the regular expression on my side.

Of course my best option is clearly the first one, since the system does the processing. So is it possible? Because I've tried to set the 'processTextField' with 'regExp' parameter but I do n't get any success on it.

Oksana Serdyuk posted this 12 November 2014

For the receipt capture we recommend either to use the special API, which is currently in beta-testing (my colleague Victoria has already written to you about it by email), or to use the processImage method with the textExtraction profile. For the second variant it is important to understand that the text order sometimes may be wrong. It may happen because of during document analysis the text is grouped in blocks by columns. So, we recommend for this scenario to perform export to XML (words are exported with its coordinates) and then sort the words using its coordinates on your side.

As for the processTextField method it is hardly appropriate for this purpose because you should specify the region of the text field on the image strictly and this region must be invariable from one image to another.