Image and resulting OCR text alignment do not match

  • Last Post 31 January 2013
Sully posted this 31 January 2013

I have passed the attached image link (Image Link)]1 through the OCR SDK and although it has made a good job of picking up most of the text it has not kept the format and text alignment of the original document. Therefore I'm not able to parse the text as I want. Are there and settings I should be using. The resulting text File I got was.

DIET COKE SPORTS DRINK LUCOZADE SPORT SPORTS DRINK 0.45L T/HUG CADBURYS BOOST CADBURYS BOOST CADBURYS BOOST PETERS ROLLS PETERS ROLLS PATE CHEESE SLICES SNACKS NIK NAKS 6PK SUB-TOTAL TESCO EBBW VALE 0345 6779240 For recipes and a chance to WIN a ClOO $ift card visit' 1.87 0.75 0.75 0.75 2.50 0.59 0.59 0.59 1.00 1.00 0.65 1.50 1.88 1.88 16.30 MULTIBUY S£VINGSn 57 EASTER RANGE 3 F Cl.20 0 KP 6PACK 2 FOR 2.5 __ -1.83 TOTAL SAVINGS TOTAL TO PAY MASTERCARD SALE A0000000041010 x*xxk*6845 03 081366 ............ 1860071 START : 08/12 EXPIRY : 09/15 Cardholder PIN Verified CHANGE DUE 14.47 14.47 AID NUMBER PAN SEQ NO AUTH CODE MERCHANT ICC 0.00 CLUBCARD STATEMENT CLUBCARD NUMBER 634004902722928 POINTS THIS VISIT TOTAL UP TO 10/01/13 TOTAL INCLUDES : TPF BONUS POINTS GREEN CLUBCARD POINTS 14 2884 923 How did we do? Visit and tell us about your shopping trip 11/01/13 6:59 2444 077 9077 1380

Order By: Standard | Newest | Votes
Andrey Isaev posted this 31 January 2013

What output format you tried, TXT? If you really depend on text location it is recommended to use XML instead.

Anastasia Galimova posted this 05 February 2013

Unfortunately, in the current version of ABBYY Cloud OCR SDK the formatting of bills is often not retained. We recommend to get the recognized text with its coordinates via XML output format, and then process the result on your side. We are sorry for the inconvenience!