Parse image from pdf

  • Last Post 11 May 2017
rolenweb posted this 02 May 2017

Hello, I convert pdf to xml using processImage method. It's ok. But there are images in pdf. How can I get these images?

Oksana Serdyuk posted this 11 May 2017

In the XML export format the OCRed result is presented in the hierarchy: document > page > block > region > etc.. The block tag has the blockType attribute, which denotes the type of the block: Text, Table, Picture, Barcode, Separator, SeparatorsBox.

For the Picture type only coordinates of the regions are included, the pictures themselves are not saved. The XML document is described with the help of  the following XML schema.