SDK Support in Classic Chinese OCR

  • 3.8K Views
  • Last Post 22 January 2015
niyayeye posted this 13 January 2015

Hi Everybody!

I am doing a research project, and need to deal with printed Classic Chinese rather than Modern simplified/traditional Chinese that Abbyy Finereader supports. I find Finereader particularly helpful but I kind of need to extensively customize the OCR parameters to improve the result.I visited the Cloud SDK website today and find most of its links inaccessible to me, saying "We're sorry...The page you have requested may have been removed or repositioned and cannot be found." Thus, I have very limited ideas about how much the SDK can achieve, and I submit a request to download the SDK trial, but I suppose I will have to wait a while to get it eventually. So can anybody tell me more about the SDK, like if it can set specific font size/type, or if I can add/delete special symbols. If not, what kind of support I can get else from Finereader SDK?

I will post an example in the following to give you a rough idea about what we need.

alt text

-Please notice, there are straight and curve lines left to some characters, and Finereader often recognize them as a variety kinds of wrong results. Since the this is the vertical layout. So these are basically just underlines of the text. Is there a way to tell the Finereader to recognize them as underline when text is in vertical layout?

-You can notice there are at least two fonts used in the text, and also different font sizes. Finereader often fail to catch those text with different font and font sizes, and give out incorrect results all the time. Is there a way for me to set those parameters? Or is it possible for Finereader to catch the text when it has a mixed font size and font type, especially when there are more language involved?

-About punctuation.Chinese language punctuation has different Unicode, and under each font they appeared differently in printed text.Since I cannot set font in the Finereader, it always creates recognition problems. I assume it could be solved by setting CJK font, but I am not so sure about it since I haven't tried the SDK yet.

-There are lots of lots of lots of books we need to process. So it is impossible for me to pre-adjust the pics before OCR. Considering the amount, it will be highly impossible to do high-quality recheck after OCR. So if you guys have any suggestion for me to improve the OCR result. Please help.

Thank you all very much!!!

Attached Files

Anastasiya Medvedeva posted this 22 January 2015

Although with some delay, but let me answer your questions :)

First of all, you have mentioned some problems with access to some links on our site. Indeed, there were some issues with cn.ocrsdk.com, but at the moment everything should be fine. Please let me know, in case some problems still persist at your side.

In order to work with ABBYY Cloud OCR SDK there is no need to download anything. All that you need is simple registration via "Start free trial now" or "免费试用" button on ocrsdk.com or cn.ocrsdk.com sites. In case you may have any troubles with it, please forward you issue to cloudocrsdk@abbyy.com e-mail.

ABBYY Cloud OCR SDK does support Chinese Traditional language, please have a look at full list of supported recognition languages at this page:http://ocrsdk.com/documentation/specifications/recognition-languages/.

Regarding the question about fonts: in general ABBYY Cloud OCR SDK detects text's fonts and size automatically and API does not support setting these parameters. However, such option supported in our another SDK product ABBYY FineReader Engine 11. For more information about this product, please refer to this site: http://www.abbyy.cn/ocr_sdk_windows/.

I'm mot sure about automatic vertical CJK detection in ABBYY Cloud OCR SDK (please note that ABBYY FineReader supports this feature as well), but you can try to process some examples at our Demo tool at this page: http://cloud.ocrsdk.com/Demo/.

Also please note one another important thing as a profile. This is a set of internal settings that are designed for the main usage scenarios. You can find useful information at this page:http://ocrsdk.com/documentation/specifications/processing-profiles/.

I will really hope that my answers were useful for you!

Close