Hello,

We are performing text extraction on jpg images and on pdfs (pdfs are containing a single image).

Some images have an initial rotation of 90° (They are ID cards scanned in portrait mode instead of being scanned in landscape mode).

Text extraction works well for these kind images in the jpg format but it just returns garbage text for images in the pdf format.

I did a little test:

1) I saved one of the jpg images that has no rotation to pdf (through IrfanView).

2) I performed text extraction on this pdf -> The extracted text is OK.

3) I saved one of the jpg images that has 90° rotation to pdf.

4) I performed text extraction on this pdf -> Text extracted is not OK at all.

So it seems something is goind wrong in orientation detection when the input is a pdf file.

asked 31 Mar '15, 13:02

maol's gravatar image

maol
2911

edited 31 Mar '15, 13:03


Yes, the CorrectOrientation property is FALSE by default. In order to process rotated images you should set it to true.

link

answered 31 Mar '15, 17:17

Natalia%20Karaseva's gravatar image

Natalia Kara...
3214

Answering myself:

I think I resolved the problem by calling setCorrectionOrientation on PagePreprocessingParams:

docProcessingParams.getPageProcessingParams().getPagePreprocessingParams().setCorrectOrientation(true);
link

answered 31 Mar '15, 13:16

maol's gravatar image

maol
2911

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×47
×7
×6
×5

Asked: 31 Mar '15, 13:02

Seen: 1,247 times

Last updated: 31 Mar '15, 17:17

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal