Receipt scanning without paragraph/layout separation

  • 2.4K Views
  • Last Post 10 April 2013
  • Topic Is Solved
arithma posted this 09 April 2013

When a receipt is sent with columns that are sometimes far a part and separated, the service scans each column on its own then returns them one after the other.

The behaviour that we'd like to get is to parse each horizontal line on its own, and return them in that order.

alt text

EROSKI/center
PERALTA
PERALTA 31350   IFK/CIF:    F-20033361
PRncuT q rnnp
09-04-2013 19:29 031 04 8927 IfrP099
SALVADO DE AVENA    2.4?
SALVADO DE AVENA
COPOS AVENA EROSKI
COPOS AVENA EROSKI
i
2',49
1.65
1.65
Ordaintzekoa / A pagar 1 8$
O, £.0
**XX*X****X*6013 I*
S.:01 SC:906305 A.: 942304
BEZ/IVA V
10,02 IVA OE 7.53   0,75
Le atondlo GARA?I(e)k atenditu ?aitu
GRACIAS POR SU VISITA

  • Liked by
  • vivek
Andrey Isaev posted this 10 April 2013

OCR engine works on all kind of documents and behavior that seems correct on one complicate layout may not be so correct on others. But OCR does not know in advance which one is correct on this particular document, so it has been tuned to keep reasonable balance to work OK in most of the cases.

My recommendation would be to use XML output instead of TXT, and look for text coordinates information when parsing receipt. This way you will be able to decide yourself what would be correct reading order.

Close