scan not recognized as text

  • 46 Views
  • Last Post 2 weeks ago
  • Topic Is Solved
b-t-o posted this 3 weeks ago

Hi!

What can I do if a single page of a pdf invoice is not fully recognized as text?

<block blockType="Text" blockName="" l="278" t="2872" r="360" b="2926"><region><rect l="353" t="2872" r="360" b="2878"/><rect l="278" t="2878" r="360" b="2919"/><rect l="353" t="2919" r="360" b="2926"/></region>
<text>
<par lineSpacing="-1"></par>
</text>
</block>
<block blockType="Text" blockName="" l="539" t="198" r="567" b="233"><region><rect l="560" t="198" r="567" b="199"/><rect l="539" t="199" r="567" b="232"/><rect l="539" t="232" r="560" b="233"/></region>
<text>
<par lineSpacing="940">
<line baseline="226" l="546" t="206" r="560" b="226"><formatting lang="EnglishUnitedStates">
<charParams l="546" t="206" r="560" b="226" suspicious="1">1</charParams></formatting></line></par>
</text>
</block>
<block blockType="Picture" blockName="" l="395" t="235" r="2867" b="1492"><region><rect l="1614" t="235" r="1851" b="236"/><rect l="559" t="236" r="2867" b="237"/><rect l="395" t="237" r="2867" b="1014"/><rect l="395" t="1014" r="2866" b="1487"/><rect l="395" t="1487" r="2867" b="1489"/><rect l="395" t="1489" r="2867" b="1490"/><rect l="826" t="1490" r="2507" b="1491"/><rect l="2252" t="1491" r="2408" b="1492"/></region>
</block>
<block blockType="Text" blockName="" l="394" t="1609" r="2270" b="1675"><region><rect l="1016" t="1609" r="1916" b="1610"/><rect l="394" t="1610" r="2119" b="1611"/><rect l="394" t="1611" r="2260" b="1612"/><rect l="394" t="1612" r="2270" b="1673"/><rect l="394" t="1673" r="2270" b="1674"/><rect l="822" t="1674" r="2270" b="1675"/></region>

But there is a table with text (invoice elements) - not an unrecognizable image.
What can I do to fix this for the future? I am using processImage?language=german,latin,english&exportFormat=xml at this time.

Best wishes

Marc

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 2 weeks ago

Could you please share your source PDF file or send it to CloudOCRSDK@abbyy.com, so that we can find the more appropriate recognition settings for your scenario?

Oksana Serdyuk posted this 2 weeks ago

Hi Marc,

Thank you for the provided document. Please try the following recognition settings:

processImage?language=German,English&profile=textExtraction

b-t-o posted this 2 weeks ago

Hi Oksana,

thank you for the new parameter.
It works well.

Thank you.

Best wishes

Marc

Close