When a receipt is sent with columns that are sometimes far a part and separated, the service scans each column on its own then returns them one after the other.

The behaviour that we'd like to get is to parse each horizontal line on its own, and return them in that order.

alt text

EROSKI/center
PERALTA
PERALTA 31350   IFK/CIF:    F-20033361
PRncuT q rnnp
09-04-2013 19:29 031 04 8927 IfrP099
SALVADO DE AVENA    2.4?
SALVADO DE AVENA
COPOS AVENA EROSKI
COPOS AVENA EROSKI
i
2',49
1.65
1.65
Ordaintzekoa / A pagar 1 8$
O, £.0
**XX*X****X*6013 I*
S.:01 SC:906305 A.: 942304
BEZ/IVA V
10,02 IVA OE 7.53   0,75
Le atondlo GARA?I(e)k atenditu ?aitu
GRACIAS POR SU VISITA

asked 09 Apr '13, 22:25

arithma's gravatar image

arithma
132


OCR engine works on all kind of documents and behavior that seems correct on one complicate layout may not be so correct on others. But OCR does not know in advance which one is correct on this particular document, so it has been tuned to keep reasonable balance to work OK in most of the cases.

My recommendation would be to use XML output instead of TXT, and look for text coordinates information when parsing receipt. This way you will be able to decide yourself what would be correct reading order.

link

answered 10 Apr '13, 10:10

Andrey%20Isaev's gravatar image

Andrey Isaev ♦♦
2835

edited 10 Apr '13, 10:11

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×103

Asked: 09 Apr '13, 22:25

Seen: 2,366 times

Last updated: 27 Aug '13, 00:28

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal