I'm trying to process a document which is small amounts of text separated by dotted lines. The returned XML contains only blocks like the following:
What can I do to get it to recognize the text instead of the separators?
asked 31 Oct '13, 19:51
You can try to use the fieldLevelRecognition profile - it is suitable for recognizing short text fragments. If it does not help, you can specify the text coordinates manually using the processTextField or the processFields method. Also you can send your image to CloudOCRSDK@abbyy.com to let us investigate the issue.
Additional recommendations are the following:
1) If it is possible, improve the source image quality according to the article Source Image Recommendations (especially note that the recommended resolution is 300 dpi for the common font size 10-14pt). Note that the text should not be recognized if it is difficult even to read it.
2) Make sure that you have set the most suitable settings for the method you use.
Here are the full list of settings for the processImage method: http://ocrsdk.com/documentation/apireference/processImage/. Here are the full list of settings for the text field with specified coordinates: http://ocrsdk.com/documentation/apireference/processTextField/. You can use this setting in the processTextField method or, if you need to recognize more than 5 fields per page, in the processFields method.
3) If you recognize text fields on inhomogeneous background, the result will be better if you recognize text with different background as separate fields.
3) Probably the images you recognize will be recognized better in the next version of technologies, which should be implemented in the nearest several months. We can provide you this information if you send your image to CloudOCRSDK@abbyy.com
4) Also you can send us your images to let us imrove the recognition quality in the future.
This answer is marked "community wiki".