Why is OCR SDK processImage returning bad results for receipt?

  • 2.7K Views
  • Last Post 08 July 2015
ppunzalan posted this 23 June 2015

Hi,

I'm trying to process a receipt and am getting very poor results on a particular image. I'm calling processImage with exportFormat set to txt, correctSkew set to false, and imageSource set to scanner. Below is the image I'm processing

alt text

and the results being returned are

alt text

As you can see, a lot of the item descriptions are missing, some of the amounts don't have values for cents, there are extra spaces in the return, among other issues. What can I do to get better results?

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 23 June 2015

Hi,

Please try the textExtraction profile for your scenario. This profile is suitable for extracting all text from the input image.

Note that the red oval hinders ABBYY Cloud OCR SDK to recognize accurately the text above and below the line: Age Confirmed - 12/12/1912. This is expected behavior of the program.

ppunzalan posted this 23 June 2015

Hi Oksana,

Please see my answer below, as I cannot include images when commenting on your answer (limitation of the forum).

Thanks.

ppunzalan posted this 23 June 2015

Hi Oksana,

I tried adding profile=textExtraction and this particular receipt is getting better results. Here is what was returned:

alt text

However, other receipts are getting bad results with profile=textExtraction. For example, when I submit this image

alt text

I was getting these results (without profile=textExtraction)

alt text

but now I'm getting these results (with profile=textExtraction)

alt text

As you can see, I'm loosing the Subtotal, line item amounts (and those that are read are still incorrectly read), Total amount, etc. Is there one call to read a receipt that will work on all receipts?

Oksana Serdyuk posted this 24 June 2015

We have tested your images and sent our results and recommendations to you by e-mail.

ppunzalan posted this 30 June 2015

Hi Oksana,

As suggested in your email response (that I've attached below), I have already tried setting the profile=textExtraction with mixed results. You also state "try to find more optimal recognition settings for your kind of images", but that's what I'm asking your advise on. What would those settings be?

You also suggest using a better image quality, but I'm trying to process receipts that clients will be taking photos of with their mobile phones and then emailing to a server for processing. I believe your ABBYY FineReader 12 is a desktop application, which isn't an option since all processing is online. Is there a perimeter that can be passed to ABBYY Cloud OCR SDK making the SDK increase the image quality?

Is there any other suggestions you might have to make ABBYY Cloud OCR SDK work for me?

Thanks.


Hi Pamela,

Thank you for your interest in our product.

We are writing to you regarding your question at ABBYY Cloud OCR SDK forum. To achieve better recognition results we could advise you to take care of the source images quality and try to find more optimal recognition settings for your kind of images. Below you can find our recommendations which you can use as a starting point.

At first, it is necessary to notice that your images have quite low resolution for recognition. Mind that the image resolution has a real impact on the OCR quality that can be achieved. We have changed resolution of your image to more optimal values using ABBYY FineReader 12: Image Editor -> the Resolution tool. Please review the OCR - Optimal Image Resolution article to know more about the recommended resolution values for OCR purposes.

Also as we have already written at our forum, it is usually recommended to use the textExtraction profile for your usage scenario. This profile is better to use for receipts processing as it provides better results both in recognition quality and in speed of processing. Morever it is suitable for extracting all text from the input image, including small text areas of low.

We have tested your images and managed to achieve quite good recognition results using our above recommendations. Please find our results in the attachment:

Folder Images consists of your original image and our images after FineReader 12 image preprocessing; Folder Results consists of two subfolders: textExtraction and documentConversion. They have our OCR results which we have got using the processImage method with corresponding profiles.

Hope the information is useful.

If you have any technical issues, please visit our Developer Forum to get fast help from ABBYY Cloud OCR SDK developers’ community. Follow us on Twitter to get the latest news.

Kind regards, Oksana Serdyuk Technical Support Engineer

Oksana Serdyuk posted this 01 July 2015

We have ABBYY Mobile Imaging SDK that you can use for image preprocessing on the mobile devices.

ppunzalan posted this 01 July 2015

That will not work for us since we need to streamline the process for our end users. Do you have plans to support imaging in the API in the future? Is there someone I can contact directly at ABBYY to speak to about this issue?

Oksana Serdyuk posted this 02 July 2015

So far there are no plans to support the image preprocessing in ABBYY Cloud OCR SDK. Anyway, I've forwarded your contact info to my colleagues from our office located in your region. They will contact you soon to discuss the issue.

rainerp posted this 08 July 2015

Hi Pamela,

in BETA we have a method that extracts the data from receipts and returns it in an XML structure.

Cheers,

Rainer

ppunzalan posted this 08 July 2015

Thanks Rainer,

I was recently told about this module and after performing some testing, I've found it does a much better job than the processImage option.

rainerp posted this 06 September 2016

The method for receipt capture is now offically released for the USA. For other countries it is still in beta. Please see more information here: http://ocrsdk.com/documentation/apireference/processReceipt/

and here

https://www.abbyy.com/receipt-capture-ocr/

Close