We are performing text extraction on jpg images and on pdfs (pdfs are containing a single image).
Some images have an initial rotation of 90° (They are ID cards scanned in portrait mode instead of being scanned in landscape mode).
Text extraction works well for these kind images in the jpg format but it just returns garbage text for images in the pdf format.
I did a little test:
1) I saved one of the jpg images that has no rotation to pdf (through IrfanView).
2) I performed text extraction on this pdf -> The extracted text is OK.
3) I saved one of the jpg images that has 90° rotation to pdf.
4) I performed text extraction on this pdf -> Text extracted is not OK at all.
So it seems something is goind wrong in orientation detection when the input is a pdf file.
Yes, the CorrectOrientation property is FALSE by default. In order to process rotated images you should set it to true.
answered 31 Mar '15, 17:17
I think I resolved the problem by calling setCorrectionOrientation on PagePreprocessingParams:
answered 31 Mar '15, 13:16