how to detect start - end of a table ? (pdf to .txt) python

  • 49 Views
  • Last Post 3 weeks ago
Jeen Dzul posted this 4 weeks ago

how to detect start - end  of a table and add tags to .txt  for identify when is a table.

example i need this : 

 

-#tag(start)

PROCESS TIME TEMP IN DEG F COOLING METHOD

HARDEN 6 HOURS 1575 QUENCHED TO QUENCH TEMPERATURES

POLYMER QUENCHEDA: 101-106 B: 104-110 C: 97-103

TEMPER 9 HOURS A-B: 1056 C: 1066 WATER QUENCHED

-#tag(end)

TENSILEYIELD .2% OFF

YIELD .6% EUL

ELONG IN 2" REDUCTION

148,200

 

or is posible add ' | ' for to delimit cell in the .txt

Order By: Standard | Newest | Votes
Jeen Dzul posted this 4 weeks ago

????

Vishnu posted this 3 weeks ago

Hi jeen Dzul,

      As far as i know, I don't think abbyy retains table start and end information in .txt. You can try out xml output from abbyy using documentConversion profile(retains document structure) instead of textExtraction. Refer here more on profiles.

In the xml response you can look for block tag with attribute blockType='Table' like below,

<block blockType="Table">

   <row>

      <cell></cell>

      <cell></cell>

   </row>

</block>

Refer here to know more on xml tags.

 

Thanks,

Vishnu

  • Liked by
  • Jeen Dzul
Jeen Dzul posted this 3 weeks ago

but is posible is my file to scan is a pdf. ???

Close