Get Output XML with identifier attribute (to Uniquely identify block among all)

  • 888 Views
  • Last Post 10 February 2016
pprashant9490 posted this 07 February 2016

Hi,

I am using Abbyy CLI. I wanted to know how we can get exported XML (PDF to XML) with Identifier attribute (<text id ={value}>). I have seen one example on your site link as below

https://abbyy.technology/en:features:ocr:xml

in which, DemoImage1_ABBYY_Basic.xml output sample has the format which I am looking for. But I am not able understand which output-format with which parameter I should use to get the same format as in the example ?? I have tried different parameter with output-format XML & ALTO, but none of it seem to work.

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 09 February 2016

If we speak about ABBYY FineReader Engine 11 and the CommandLineInterface sample, then you can use the CLI utility in one of the following ways:

CommandLineInterface [options] -if <image file> -f format [options] -of <export file>
CommandLineInterface [options] -if <image file> -f format1 [options] -of <export file 1> -f format2 [options] -of <export file 2>
CommandLineInterface [options] -if <image file 1> [options] -if <image file 2> -f format [options] -of <export file>

For example, if you want to recognize Demo.pdf and export it to the XML file of the XCA_Basic output type, you should use the following command:

CommandLineInterface -if Demo.pdf -f XML -xca -of Demo.xml

More information about usage of the sample you can find in Code Samples Library (the destination by default is “%ProgramData%\ABBYY\SDK\11\FineReader Engine\Samples\SamplesBrowsingTools\pages\sample_CommandLineInterface.html”).

  • Liked by
  • pprashant9490
pprashant9490 posted this 10 February 2016

Ok thanks Oksana. I will try it out and let you know.

Close