[Cloud] How to retain the document structure using the textExtraction profile

  • Last Post 23 March 2015
jsack posted this 28 January 2015

Is it possible to achieve the text quality and coverage (i.e., including smaller text or low quality areas of the image) of the textExtraction profile, while retaining the document structure information of the documentConversion profile? If not possible through the cloud API, then would it be possible via direct usage FineReader (or similar)?

Oksana Serdyuk posted this 23 March 2015

Sorry for the delay in response.

Recently ABBYY Cloud OCR SDK team has improved technology of TXT export used in our service. Now the text export format simulates original layout of a source document with the help of inserted spaces and empty lines. The new TXT export is available by default if your application uses the exportFormat=txt option.

The old text export format is available, too. It can be used by setting the exportFormat option to txtUnstructured. If this option value is selected, OCR results will be saved in the resulting text file in the same order as they are recognized, i.e. block by block.

ABBYY Cloud OCR SDK documentation will be updated accordingly.