PDF-Text Extraction from Text Layer

  • 5.5K Views
  • Last Post 20 August 2013
Sham posted this 08 August 2013

I want to extract text directly from text layer of PDF without applying OCR to it. From following lines of FREngine10UserGuide(Page no.386), I think it is possible. IsFromSourceContent - Specifies whether the character has been extracted from the text content of the input file without recognition. For example, it can be extracted from a PDF file with a text layer Please can anyone tell me How to do this using finereader?

SDK_support posted this 20 August 2013

Hello Sham!

If you set the IObjectsExtractionParams::SourceContentReuseMode = CRM_ContentOnly parameter then only text layer of the source PDF file is used, the image layer is not used. However, note that if the text line contains characters not included in the alphabet of the selected recognition languages, this text cannot be written to the result and the line would have to be rerecognized.

Please see for details the page no.754 of FREngine10UserGuide.

Best regards, Natalia.

Close