I would like to extract ingredients from the label of a beauty products, which are very often cylinder-shaped. It seems the "barrel effect" is giving the OCR engine some troubles. An example of one such photo is this:

Bottle with text

Currently I get the following result using English language and the textExtraction profile:

WhoW200ml:
qua. axarnidopropytamine oxide, cNortwxidine
Quorate, isopropyl alcohol, oxidized potypropyte
polyethylene, hydroxy propyl mettiyloeW
jrm

(Removed some special characters which messed with the formatting)

Is there any way to use ABBYY OCR SDK for extracting the ingredients more precisely?

asked 15 Aug '12, 14:02

cuvius's gravatar image

cuvius
313

edited 01 Oct '12, 09:33

Vasily%20Panferov's gravatar image

Vasily Panferov ♦♦
5422516

Are you working for Java language?

(18 Aug '12, 15:43) sam

@sam: It shouldn't make any difference what language the OP uses - OCR SDK Service has a REST interface that is language and platform agnostic.

(20 Aug '12, 11:47) Dmitry Me ♦♦

First of all, the image you present has very low resolution - the beginning of the word "oxidized" in "oxidized polyethylene" is barely readable after the original image is zoomed, so information is just lost and is beyond recovery. To address this you should get photos with several times higher resolution. That's the number one thing you could do to improve OCR quality.

Second, our OCR engine has preprocessing to address this kind of distortion but it is targeted for double-page spreads and typically deals with less distortion. The scenario you're dealing with - when the can has small radius of curvature - can be too much for that preprocessing technique.

link

answered 15 Aug '12, 15:55

Dmitry%20Me's gravatar image

Dmitry Me ♦♦
2387

The original image has much better resolution (2592x1936), I just scaled it down to fit the question. Unfortunately that doesn't improve the results. Thanks for the swift reply, though, I realize this is a special use case.

(15 Aug '12, 16:12) cuvius
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×25
×3

Asked: 15 Aug '12, 14:02

Seen: 3,956 times

Last updated: 01 Oct '12, 09:33

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal