Hello,

I just noticed something strange.

If I do this:

IDocumentProcessingParams docProcessingParams = engine.CreateDocumentProcessingParams();        docProcessingParams.getPageProcessingParams().getPagePreprocessingParams().setCorrectOrientation(true);

engine.LoadPredefinedProfile("TextExtraction_Accuracy");

document.Process(docProcessingParams);

it gives a totally different result than if I do this:

engine.LoadPredefinedProfile("TextExtraction_Accuracy");

IDocumentProcessingParams docProcessingParams = engine.CreateDocumentProcessingParams();        docProcessingParams.getPageProcessingParams().getPagePreprocessingParams().setCorrectOrientation(true);

document.Process(docProcessingParams);

The first example does preserve the layout of the document. The second example does not preserve the layout of the document.

This is quite noticeable when there are tables.

Is it a bug or the order is important ?

asked 20 Apr '15, 14:00

maol's gravatar image

maol
2911


Well, the TextExtraction_Accuracy profile contains some settings, such as EnableTextExtractionMode=true, which significally improve the text recognition quality. But it affects the layout preservation.

You could investigate the settings from TextExtraction_Accuracy profile and choose which ones have positive influence on recognition quality. All the profile's settings are listed in the above-mentioned article.

In addition, I could recommend to take a look at "Improving Recognition Quality" article in Developer' Help. Hope, it will be also useful.

link

answered 26 Apr '15, 11:18

Natalia%20Karaseva's gravatar image

Natalia Kara...
3214

Yes, the order is important. As it is said in Developer's Help->Specifications->Predefined Profiles Specification : "All objects created after the profile is loaded will have these properties set to the specified values".

So, when you load TextExtraction profile before creating IDocumentProcessingParams, it is correct. All the settings from this profile will be used for processing, and as a result, the layout will not be preserved.

link

answered 20 Apr '15, 19:33

Natalia%20Karaseva's gravatar image

Natalia Kara...
3214

Ok I see ;)

Still, is it possible to get the benefits of the TextExtraction_Accuracy profile AND preserve layout ?

link

answered 22 Apr '15, 14:15

maol's gravatar image

maol
2911

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×39
×2

Asked: 20 Apr '15, 14:00

Seen: 1,405 times

Last updated: 26 Apr '15, 11:18

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal