PDF Text Extraction from Text Layer via OCR SDK

  • 2.3K Views
  • Last Post 28 August 2013
ScraperDragon posted this 23 August 2013

Having looked at PDF-Text Extraction from Text Layer, I can see it's possible to get the underlying text of a PDF document from FineReader Engine 10. Is this possible via ABBYY Cloud OCR SDK at all?

Order By: Standard | Newest | Votes
Anastasia Galimova posted this 27 August 2013

Unfortunately, this feature is not implemented in ABBYY Cloud OCR SDK.

  • Liked by
  • ScraperDragon
samuelcossette posted this 27 August 2013

Hi,

If you want to extract the text layer, you can use a PDF lib like Poppler or PDFMiner.

All the best,

Sam

ScraperDragon posted this 28 August 2013

Sadly, I'm really interested in automatic layout detection, which is scarce.

samuelcossette posted this 28 August 2013

What do you mean by "automatic layout detection". Could you give an example/more detail?

Close