how to detect start - end of a table ? (pdf to .txt) python

  • 136 Views
  • Last Post 06 February 2019
Jeen Dzul posted this 29 January 2019

how to detect start - end  of a table and add tags to .txt  for identify when is a table.

example i need this : 

 

-#tag(start)

PROCESS TIME TEMP IN DEG F COOLING METHOD

HARDEN 6 HOURS 1575 QUENCHED TO QUENCH TEMPERATURES

POLYMER QUENCHEDA: 101-106 B: 104-110 C: 97-103

TEMPER 9 HOURS A-B: 1056 C: 1066 WATER QUENCHED

-#tag(end)

TENSILEYIELD .2% OFF

YIELD .6% EUL

ELONG IN 2" REDUCTION

148,200

 

or is posible add ' | ' for to delimit cell in the .txt

Order By: Standard | Newest | Votes
Jeen Dzul posted this 01 February 2019

????

Vishnu posted this 04 February 2019

Hi jeen Dzul,

      As far as i know, I don't think abbyy retains table start and end information in .txt. You can try out xml output from abbyy using documentConversion profile(retains document structure) instead of textExtraction. Refer here more on profiles.

In the xml response you can look for block tag with attribute blockType='Table' like below,

<block blockType="Table">

   <row>

      <cell></cell>

      <cell></cell>

   </row>

</block>

Refer here to know more on xml tags.

 

Thanks,

Vishnu

  • Liked by
  • Jeen Dzul
Jeen Dzul posted this 06 February 2019

but is posible is my file to scan is a pdf. ???

Close