How to efficiently check if a document has text?

  • 2K Views
  • Last Post 05 April 2012
rblasco posted this 04 April 2012

Hello,

which way you recommend to efficiently check if a document has text? we want to retrieve the ocr results in xml format so our first attempt is to look for blocks which blockType attribute is ="Text". Is there a flag somewhere? It's our method reliable?

thanks!

  • Liked by
  • Nikolay_Kh
Vasily Panferov posted this 05 April 2012

Correct. To check if the document contain text, you need to recognize it as xml and look for a <block blockType="Text"> element. If there is no such elements in document, there is no text recognized on it. But there still can be image blocks, barcodes etc.

Close