I want to extract text directly from text layer of PDF without applying OCR to it. From following lines of FREngine10UserGuide(Page no.386), I think it is possible. IsFromSourceContent - Specifies whether the character has been extracted from the text content of the input file without recognition. For example, it can be extracted from a PDF file with a text layer Please can anyone tell me How to do this using finereader?

asked 08 Aug '13, 11:21

Sham's gravatar image

Sham
1112


Hello Sham!

If you set the IObjectsExtractionParams::SourceContentReuseMode = CRM_ContentOnly parameter then only text layer of the source PDF file is used, the image layer is not used. However, note that if the text line contains characters not included in the alphabet of the selected recognition languages, this text cannot be written to the result and the line would have to be rerecognized.

Please see for details the page no.754 of FREngine10UserGuide.

Best regards, Natalia.

link

answered 20 Aug '13, 17:14

SDK_support's gravatar image

SDK_support ♦♦
2763

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×27

Asked: 08 Aug '13, 11:21

Seen: 5,321 times

Last updated: 20 Aug '13, 17:14

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal