Deskewed pdfa output doesnot match with Corrected XML

  • Last Post 04 March 2019
Vishnu posted this 04 December 2018


I'm trying to get xml(xmlForCorrectedImage format) and pdfa(corrected output format) for the source image. Eventhough the page width, height matches with (pages in) pdfa output and corrected xml, the coordinates are not exact.

Am i doing it wrong somewhere?

This are the parameters i'm using,

          language, "textExtraction","auto","txt,xmlForCorrectedImage,pdfA","true");




Order By: Standard | Newest | Votes
Helen Osetrova posted this 26 December 2018



The possible reason for differences in text coordinates could be an automatic skew correction. In order to disable it, kindly set the correctSkew parameter of the processImage method to "false". Please note that if the image is actually skewed, the recognition quality might be unsatisfying.


In addition, setting the imageSource parameter to the "scanner" value might be helpful. In this mode, Cloud OCR SDK does not correct possible image distortions and the coordinates remain the same.


For the more specific recommendations, kindly provide us with the source image.


Vishnu posted this 21 January 2019

Hi helen,

 Is there anyway that if i can get a deskewed image as an output pdf/image and its respective coordinates(exact) in correctedXml?

What i mean is, parameter correctSkew should be default(true) and imageSource as auto. Now if i process a skewed sample, i should get its deskewed pdf/image and its respective correctedXml as output and the coordinates should be exact with respect to the outputted deskewed pdf/image.




Vishnu posted this 12 February 2019

 Can someone please followup on this thread?

Sasha Zendrikova posted this 14 February 2019


Could you kindly specify how exactly you compare coordinates between correctedXml and pdfA files?
Also, if you could provide us with the source image, it would be easier to find out the problem.

Vishnu posted this 14 February 2019

Hi sasha,

      I compared the pdfA file by importing it in an image editing tool(shows coordinates as we move pointer) with exact width and height as obtained in correctedXml page. Will send a sample source image in private.



Sasha Zendrikova posted this 27 February 2019


Sorry for a long silence.

Your issue is not trivial and I have asked for some advice from our development team. I will tell you when they find out something. 

Until that you can try to use images with higher resolution (tests showed, that problem occurred only on images with low resolution).  

Also, you can use our FineReader Engine solution. For some reason, there is no problem with coordinates on your image when processing with FineReader Engine.  

Unfortunately, that is all I can suggest for now.
Hope this will be useful.

Vishnu posted this 04 March 2019

Your issue is not trivial and I have asked for some advice from our development team. I will tell you when they find out something. 

Ok thanks.