How to convert semi structured and unstructured information from OCR to a structured information

  • Last Post 18 October 2016
Anand Rao posted this 18 October 2016

We tried to process a document having table like structure but not table. extracted data considers the first column as different text block and the second column as different text block. We need to store first column values as key and second as value.

First Field Value
Price 50
Stock 20

Co-ordinates coming form cloud ocr are not matching its coming like

<line baseline="583" l="289" t="559" r="695" b="583" xmlns=""> <formatting lang="EnglishUnitedStates"> <charParams l="289" t="559" r="306" b="582">P</charParams> <charParams l="307" t="559" r="319" b="583">R</charParams> <charParams l="320" t="559" r="339" b="583">I</charParams> <charParams l="341" t="559" r="361" b="582">C</charParams> <charParams l="364" t="560" r="381" b="583">E</charParams> </formatting> </line>

values are coming like

 <line baseline="2808" l="1801" t="2774" r="1871" b="2808" xmlns="">
<formatting lang="EnglishUnitedStates">
  <charParams l="1801" t="2775" r="1819" b="2806">7</charParams>
  <charParams l="1850" t="2774" r="1871" b="2807">0</charParams>

There is no match in any of the coordinate between field and value how to tackle this to get correctly mapped values?

Oksana Serdyuk posted this 18 October 2016

Hi, please send your source image to for our tests.