XML returned is all separators

  • 1.2K Views
  • Last Post 01 November 2013
  • Topic Is Solved
kristianpicon posted this 31 October 2013

I'm trying to process a document which is small amounts of text separated by dotted lines. The returned XML contains only blocks like the following:

<block blockType="Separator" l="5418" t="64" r="5496" b="66">
<region>
<rect l="5418" t="64" r="5496" b="66"/>
</region>
<separator type="Dotted" thickness="1">
<start x="5418" y="65"/>
<end x="5496" y="65"/>
</separator>
</block>

What can I do to get it to recognize the text instead of the separators?

  • Liked by
  • Katia Sirotina
Order By: Standard | Newest | Votes
Anastasia Galimova posted this 01 November 2013

You can try to use the fieldLevelRecognition profile - it is suitable for recognizing short text fragments. If it does not help, you can specify the text coordinates manually using the processTextField or the processFields method. Also you can send your image to CloudOCRSDK@abbyy.com to let us investigate the issue.

updated:

Additional recommendations are the following:

1) If it is possible, improve the source image quality according to the article Source Image Recommendations (especially note that the recommended resolution is 300 dpi for the common font size 10-14pt). Note that the text should not be recognized if it is difficult even to read it.

2) Make sure that you have set the most suitable settings for the method you use.

Here are the full list of settings for the processImage method: http://ocrsdk.com/documentation/apireference/processImage/. Here are the full list of settings for the text field with specified coordinates: http://ocrsdk.com/documentation/apireference/processTextField/. You can use this setting in the processTextField method or, if you need to recognize more than 5 fields per page, in the processFields method.

3) If you recognize text fields on inhomogeneous background, the result will be better if you recognize text with different background as separate fields.

3) Probably the images you recognize will be recognized better in the next version of technologies, which should be implemented in the nearest several months. We can provide you this information if you send your image to CloudOCRSDK@abbyy.com

4) Also you can send us your images to let us imrove the recognition quality in the future.

  • Liked by
  • Katia Sirotina
kristianpicon posted this 01 November 2013

The fieldLevelRecognition profile and processFields method gives strange and inaccurate results compared with when I process each field separately using the document-level recognition profile. Is there anything I can do to fix this??

Close