We are performing text extraction on jpg images and on pdfs (pdfs are containing a single image).

Some images have an initial rotation of 90° (They are ID cards scanned in portrait mode instead of being scanned in landscape mode).

Text extraction works well for these kind images in the jpg format but it just returns garbage text for images in the pdf format.

I did a little test:

1) I saved one of the jpg images that has no rotation to pdf (through IrfanView).

2) I performed text extraction on this pdf -> The extracted text is OK.

3) I saved one of the jpg images that has 90° rotation to pdf.

4) I performed text extraction on this pdf -> Text extracted is not OK at all.

So it seems something is goind wrong in orientation detection when the input is a pdf file.

asked 31 Mar '15, 13:02

maol's gravatar image


edited 31 Mar '15, 13:03

Yes, the CorrectOrientation property is FALSE by default. In order to process rotated images you should set it to true.


answered 31 Mar '15, 17:17

Natalia%20Karaseva's gravatar image

Natalia Kara...

Answering myself:

I think I resolved the problem by calling setCorrectionOrientation on PagePreprocessingParams:


answered 31 Mar '15, 13:16

maol's gravatar image


Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 31 Mar '15, 13:02

Seen: 1,306 times

Last updated: 31 Mar '15, 17:17

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal