We are evaluating the ABBYY Cloud OCR service and getting mixed results with Arabic text. We are trying to recognize numerals in scans of identification documents and get different results, depending on several factors:

  • Language used: different results when specifying "Arabic" alone and "Arabic,English" with generally better results with the latter.
  • Sending the whole image or the part we're interested in: sending the whole image results in generally better recognition.
  • The images themselves: it seems some seemingly very similar images can produce radically different results.

Our main use case is to cut out the part of the image that we want to recognize and send it to ABBYY, using the processImage method.

Our question is: is there a way to improve the accuracy of the recognition?

Here's a link to a document with some representative images and their recognition results: Arabic OCR Images

asked 04 Jun '15, 12:18

vgiannadakis's gravatar image

vgiannadakis
113

edited 04 Jun '15, 12:25


We recommend you to test our field-level recognition mode for your scenario:

  • To extract the value of one text field on an image, you can use the processTextField method.

  • If you want to recognize a lot of small text fields on a page, we recommend to use the processFields method. It allows to specify the coordinates of each field in an XML file.

In your case it is useful to limit characters with the help of the letterSet parameter, for example "٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹".

Please check your e-mail, we have Sent you our recognition results.

link

answered 04 Jun '15, 18:33

Oksana%20Serdyuk's gravatar image

Oksana Serdyuk ♦♦
1.5k16

Thank you for your response, we will test / evaluate and let you know.

(04 Jun '15, 19:31) vgiannadakis
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×7
×1

Asked: 04 Jun '15, 12:18

Seen: 641 times

Last updated: 04 Jun '15, 19:31

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal