Mixed recognition results with Arabic

  • Last Post 04 June 2015
vgiannadakis posted this 04 June 2015

We are evaluating the ABBYY Cloud OCR service and getting mixed results with Arabic text. We are trying to recognize numerals in scans of identification documents and get different results, depending on several factors:

  • Language used: different results when specifying "Arabic" alone and "Arabic,English" with generally better results with the latter.
  • Sending the whole image or the part we're interested in: sending the whole image results in generally better recognition.
  • The images themselves: it seems some seemingly very similar images can produce radically different results.

Our main use case is to cut out the part of the image that we want to recognize and send it to ABBYY, using the processImage method.

Our question is: is there a way to improve the accuracy of the recognition?

Here's a link to a document with some representative images and their recognition results: Arabic OCR Images

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 04 June 2015

We recommend you to test our field-level recognition mode for your scenario:

  • To extract the value of one text field on an image, you can use the processTextField method.

  • If you want to recognize a lot of small text fields on a page, we recommend to use the processFields method. It allows to specify the coordinates of each field in an XML file.

In your case it is useful to limit characters with the help of the letterSet parameter, for example "٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹".

Please check your e-mail, we have Sent you our recognition results.

  • Liked by
  • vgiannadakis
vgiannadakis posted this 04 June 2015

Thank you for your response, we will test / evaluate and let you know.