We are evaluating the ABBYY Cloud OCR service and getting mixed results with Arabic text. We are trying to recognize numerals in scans of identification documents and get different results, depending on several factors:
- Language used: different results when specifying "Arabic" alone and "Arabic,English" with generally better results with the latter.
- Sending the whole image or the part we're interested in: sending the whole image results in generally better recognition.
- The images themselves: it seems some seemingly very similar images can produce radically different results.
Our main use case is to cut out the part of the image that we want to recognize and send it to ABBYY, using the processImage method.
Our question is: is there a way to improve the accuracy of the recognition?
Here's a link to a document with some representative images and their recognition results: Arabic OCR Images