Difference in extracted result with documents in Arabic language

  05 March 2018
sharathshan143@gmail.com posted this 13 February 2018

We are trying to extract contents of input image "original format.pdf" attached. OCR engine is not extracting all the words in it, for example "بيانات الكفيل"


But if we change the orientation of same image - slightly tilted to the left, the engine is able to extract that particular word "بيانات الكفيل" Refer attachment "adjusted format.pdf"


We need to do this kind of adjustment in orientation to extract the required words. But not able to fix a orientation that will extract all the required words from the input document.


Can you tell us a solution for this? 

Kseniya Leontyeva posted this 05 March 2018


As we discussed by the email, Cloud OCR SDK has limited functionality, therefore if an image doesn't meet Source Image Recommendations like for example low contrast, the result might not be satisfactory. 

In this case, you may perform initial preprocessing by yourself. It may be helpful to increase brightness, exposure and contrast of a source image in a way that the text could be seen well by the human eye and other elements, like prints or background image looks overexposed and almost disappear.

This preprocessing might be done by almost every standard photo editor.