Deskewed pdfa output doesnot match with Corrected XML

  • 64 Views
  • Last Post 2 weeks ago
Vishnu posted this 04 December 2018

Hi,

I'm trying to get xml(xmlForCorrectedImage format) and pdfa(corrected output format) for the source image. Eventhough the page width, height matches with (pages in) pdfa output and corrected xml, the coordinates are not exact.

Am i doing it wrong somewhere?

This are the parameters i'm using,

  "language=%s&profile=%s&imageSource=%s&exportFormat=%s&xml:writeFormatting=%s",
          language, "textExtraction","auto","txt,xmlForCorrectedImage,pdfA","true");

 

Thanks,

Vishnu

Order By: Standard | Newest | Votes
Helen Osetrova posted this 26 December 2018

Hi,

 

The possible reason for differences in text coordinates could be an automatic skew correction. In order to disable it, kindly set the correctSkew parameter of the processImage method to "false". Please note that if the image is actually skewed, the recognition quality might be unsatisfying.

 

In addition, setting the imageSource parameter to the "scanner" value might be helpful. In this mode, Cloud OCR SDK does not correct possible image distortions and the coordinates remain the same.

 

For the more specific recommendations, kindly provide us with the source image.

 

Vishnu posted this 21 January 2019

Hi helen,

 Is there anyway that if i can get a deskewed image as an output pdf/image and its respective coordinates(exact) in correctedXml?

What i mean is, parameter correctSkew should be default(true) and imageSource as auto. Now if i process a skewed sample, i should get its deskewed pdf/image and its respective correctedXml as output and the coordinates should be exact with respect to the outputted deskewed pdf/image.

 

Thanks,

Vishnu

Vishnu posted this 2 weeks ago

 Can someone please followup on this thread?

Sasha Zendrikova posted this 2 weeks ago

 Hi,

Could you kindly specify how exactly you compare coordinates between correctedXml and pdfA files?
Also, if you could provide us with the source image, it would be easier to find out the problem.

Vishnu posted this 2 weeks ago

Hi sasha,

      I compared the pdfA file by importing it in an image editing tool(shows coordinates as we move pointer) with exact width and height as obtained in correctedXml page. Will send a sample source image in private.

Thanks,

Vishnu

Close