samuelcossette posted this 30 August 2013


I'm using the XML format to detect and parse tables (blockType="Table") in PDFs that have no OCR done on them and it's very reliable. I also have a bunch of PDFs that already have a text layer dans there is no OCR required on them.

Is there a way to process my document and specify to only do the cell detection?

Anastasia Galimova posted this 02 September 2013

Unfortunately, I do not understand the question well: could you please explain in other words, what is "to only do the cell detection"?

Does it mean that you want to recognize the tables from your PDF-files and skip the other text?