I would like to extract ingredients from the label of a beauty products, which are very often cylinder-shaped. It seems the "barrel effect" is giving the OCR engine some troubles. An example of one such photo is this:
Currently I get the following result using English language and the
(Removed some special characters which messed with the formatting)
Is there any way to use ABBYY OCR SDK for extracting the ingredients more precisely?
First of all, the image you present has very low resolution - the beginning of the word "oxidized" in "oxidized polyethylene" is barely readable after the original image is zoomed, so information is just lost and is beyond recovery. To address this you should get photos with several times higher resolution. That's the number one thing you could do to improve OCR quality.
Second, our OCR engine has preprocessing to address this kind of distortion but it is targeted for double-page spreads and typically deals with less distortion. The scenario you're dealing with - when the can has small radius of curvature - can be too much for that preprocessing technique.
answered 15 Aug '12, 15:55
Dmitry Me ♦♦