I'm trying to process a document which is small amounts of text separated by dotted lines. The returned XML contains only blocks like the following:

<block blockType="Separator" l="5418" t="64" r="5496" b="66">
<region>
<rect l="5418" t="64" r="5496" b="66"/>
</region>
<separator type="Dotted" thickness="1">
<start x="5418" y="65"/>
<end x="5496" y="65"/>
</separator>
</block>

What can I do to get it to recognize the text instead of the separators?

asked 31 Oct '13, 19:51

kristianpicon's gravatar image

kristianpicon
564


You can try to use the fieldLevelRecognition profile - it is suitable for recognizing short text fragments. If it does not help, you can specify the text coordinates manually using the processTextField or the processFields method. Also you can send your image to CloudOCRSDK@abbyy.com to let us investigate the issue.

updated:

Additional recommendations are the following:

1) If it is possible, improve the source image quality according to the article Source Image Recommendations (especially note that the recommended resolution is 300 dpi for the common font size 10-14pt). Note that the text should not be recognized if it is difficult even to read it.

2) Make sure that you have set the most suitable settings for the method you use.

Here are the full list of settings for the processImage method: http://ocrsdk.com/documentation/apireference/processImage/. Here are the full list of settings for the text field with specified coordinates: http://ocrsdk.com/documentation/apireference/processTextField/. You can use this setting in the processTextField method or, if you need to recognize more than 5 fields per page, in the processFields method.

3) If you recognize text fields on inhomogeneous background, the result will be better if you recognize text with different background as separate fields.

3) Probably the images you recognize will be recognized better in the next version of technologies, which should be implemented in the nearest several months. We can provide you this information if you send your image to CloudOCRSDK@abbyy.com

4) Also you can send us your images to let us imrove the recognition quality in the future.

link
This answer is marked "community wiki".

answered 01 Nov '13, 15:48

Anastasia%20Galimova's gravatar image

Anastasia Ga... ♦♦
790112

edited 05 Nov '13, 20:19

Katia%20Sirotina's gravatar image

Katia Sirotina ♦♦
264

The fieldLevelRecognition profile and processFields method gives strange and inaccurate results compared with when I process each field separately using the document-level recognition profile. Is there anything I can do to fix this??

(01 Nov '13, 16:49) kristianpicon
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×49
×1

Asked: 31 Oct '13, 19:51

Seen: 1,124 times

Last updated: 05 Nov '13, 20:19

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal