scan not recognized as text

  • 64 Views
  • Last Post 12 January 2018
  • Topic Is Solved
b-t-o posted this 05 January 2018

Hi!

What can I do if a single page of a pdf invoice is not fully recognized as text?

<block blockType="Text" blockName="" l="278" t="2872" r="360" b="2926"><region><rect l="353" t="2872" r="360" b="2878"/><rect l="278" t="2878" r="360" b="2919"/><rect l="353" t="2919" r="360" b="2926"/></region>
<text>
<par lineSpacing="-1"></par>
</text>
</block>
<block blockType="Text" blockName="" l="539" t="198" r="567" b="233"><region><rect l="560" t="198" r="567" b="199"/><rect l="539" t="199" r="567" b="232"/><rect l="539" t="232" r="560" b="233"/></region>
<text>
<par lineSpacing="940">
<line baseline="226" l="546" t="206" r="560" b="226"><formatting lang="EnglishUnitedStates">
<charParams l="546" t="206" r="560" b="226" suspicious="1">1</charParams></formatting></line></par>
</text>
</block>
<block blockType="Picture" blockName="" l="395" t="235" r="2867" b="1492"><region><rect l="1614" t="235" r="1851" b="236"/><rect l="559" t="236" r="2867" b="237"/><rect l="395" t="237" r="2867" b="1014"/><rect l="395" t="1014" r="2866" b="1487"/><rect l="395" t="1487" r="2867" b="1489"/><rect l="395" t="1489" r="2867" b="1490"/><rect l="826" t="1490" r="2507" b="1491"/><rect l="2252" t="1491" r="2408" b="1492"/></region>
</block>
<block blockType="Text" blockName="" l="394" t="1609" r="2270" b="1675"><region><rect l="1016" t="1609" r="1916" b="1610"/><rect l="394" t="1610" r="2119" b="1611"/><rect l="394" t="1611" r="2260" b="1612"/><rect l="394" t="1612" r="2270" b="1673"/><rect l="394" t="1673" r="2270" b="1674"/><rect l="822" t="1674" r="2270" b="1675"/></region>

But there is a table with text (invoice elements) - not an unrecognizable image.
What can I do to fix this for the future? I am using processImage?language=german,latin,english&exportFormat=xml at this time.

Best wishes

Marc

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 10 January 2018

Could you please share your source PDF file or send it to CloudOCRSDK@abbyy.com, so that we can find the more appropriate recognition settings for your scenario?

Oksana Serdyuk posted this 12 January 2018

Hi Marc,

Thank you for the provided document. Please try the following recognition settings:

processImage?language=German,English&profile=textExtraction

b-t-o posted this 12 January 2018

Hi Oksana,

thank you for the new parameter.
It works well.

Thank you.

Best wishes

Marc

Close