How to pass a image pdf to extract text from it using sample code provided by ABBYY in java

  • Last Post 09 June 2016
Mudit Yadav posted this 02 June 2016

I have downloaded the sample code for the ABBYY OCR in java but I am unable to understand , how can I pass a image pdf to extract text from it and how to choose a mode .Can you guide me how to use it

Ilya Sukhorukov posted this 09 June 2016

Our Java sample is a console application that runs and receives its arguments through command line. For example, to process PDF and convert it to txt, use the following command (assuming you’re in the folder where the TestApp.class file and the testDocument.pdf are, otherwise you should specify full path to testDocument.pdf):

java TestApp recognize testDocument.pdf result.txt

It can process images and documents in several modes, each of them designed for dealing with a special type of information. Mode is transmitted to the application as a first argument.

After you have specified your application ID and password in src\ and built the project, you can run it from the command line. Change the current directory to the folder where the TestApp.class file is, so that Java can find the main class. Now you can run the Java sample using one of the supported modes:

1) To process single- and multipage documents and convert them to txt, xml, pdf and other formats, use: java TestApp recognize testImage.jpg result.xml java TestApp recognize page1.jpg page2.jpg page3.jpg result.pdf --lang=French,Spanish

2) To process business cards to vCard, xml and csv, use: java TestApp busCard image.jpg result.xml

3) To process printed and handprinted text snippets, use: java TestApp textField image.jpg result.xml

4) To recognize barcodes, use: java TestApp barcode image.jpg result.xml

5) To process many different snippets on document, use: java TestApp processFields image1.jpg image2.jpg image3.tif settings.xml result.xml

6) For MRZ processing use: java TestApp MRZ image.jpg result.xml