Hello,

Currently, we use API Process Text Fields in Cloud OCR API to recognize our application form. I defined some templates setting and region to OCR, but the result returned from API seem to be doesn't match with my Regex in the templates. Below is my setting, please take a look and help us.

Thanks in advance for your help!

Ex: 1. Settings : <text id="phone"> <language>English</language> <letterset>0123456789</letterset> <regexp>([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])</regexp> <texttype>handprinted</texttype> <placeholderscount>11</placeholderscount> <markingtype>partitionedFrame</markingtype> <onetextline>true</onetextline> <onewordpertextline>true</onewordpertextline> </text> <text id="phone"> <language>English</language> <letterset>0123456789</letterset> <regexp>([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])</regexp> <texttype>handprinted</texttype> <placeholderscount>11</placeholderscount> <markingtype>partitionedFrame</markingtype> <onetextline>true</onetextline> <onewordpertextline>true</onewordpertextline> </text>

  1. Results: Phone: 12832537427212 (Exptected: 8325342122) Date : 4217212056/ (Expected: 12/21/2016)

asked 23 Dec '16, 07:50

Dang%20Vinh's gravatar image

Dang Vinh
112

Please share your image, the used processing settings and your Application ID. Kindly send this info to CloudOCRSDK@abbyy.com.

(23 Dec '16, 17:16) Oksana Serdyuk ♦♦

Hi Oksana Serdyuk, I already sent email to you guys. Thank you!

(24 Dec '16, 06:53) Dang Vinh

Hi, I have received your message. Your settings are fine, I have reproduced the issue and now I am consulting with the developers. I will let you know as soon as I get their answer.

(27 Dec '16, 09:27) Oksana Serdyuk ♦♦

Could you please explain how critical this issue is for you?

Also please specify what volumes you plan to process using ABBYY Cloud OCR SDK?

What is your usage scenario?

(27 Dec '16, 13:57) Oksana Serdyuk ♦♦

Hi Oksana Serdyuk, Sorry for late get back, We developed a system for our client. So this is our LIVE product. Please support us to get it done asap.

Here is our purchased history: "Volume Pack L (5000 pages) for Application TLS-Enrollment 14 Nov 2016 42686-00003 $199.99"

Thanks in advance!

(13 Jan, 16:46) Dang Vinh

Hi, I am consulting with the developers regarding this issue now. I will let you know about the progress.

(yesterday) Oksana Serdyuk ♦♦
showing 5 of 6 show 1 more comments

Please sorry for the delay. Our team has investigated the issue and concluded that there is no bug, this behavior is due to the peculiarities of our recognition technology.

Note that the regular expressions and the placeholdersCount parameter do not strictly limit the set of characters of the output result, i.e. the recognized value may contain characters which are not included into the regular expression and they can be more or less then you specified in placeholdersCount. These parameters are necessary for more accurate detection and recognition of the text field.

In this particular case the issue is connected with the fact that during binarization the field markup is destroyed and therefore it is not defined properly. So, you can find that the recognized value contains more characters, and the most of extra characters are "1" (the borders of markup is recognized as "1" if it was not properly deleted).

The image after binarization is the following:

alt text

However, our developers recommend to try to increase the brightness during scanning to make the image brighter.

Also it is recommended to set the field region most closely. For example, if we process the "credit_card_number" text field with the following settings:

...
  <fieldTemplates>
    <text id="credit_card_number" bottom="0" left="0" right="0" top="0">
      <language>Digits</language>
      <letterSet>0123456789</letterSet>
      <textType>handprinted</textType>
      <oneTextLine>true</oneTextLine>
      <oneWordPerTextLine>true</oneWordPerTextLine>
      <markingType>partitionedFrame</markingType>
      <placeholdersCount>16</placeholdersCount>
    </text>
  </fieldTemplates>
  <page applyTo="0">
    <!--Credit Card-->
    <text id="credit_card_number" bottom="562" right="1361" top="493" left="72" template="credit_card_number"/>
    <!--End Credit Card-->
  </page>
</document>

alt text

it is recognized accurately:

<text bottom="562" right="1361" top="493" left="72" id="credit_card_number">
    <value>4373740000796405</value>
    <line bottom="551" right="1344" top="494" left="86">
        <char bottom="551" right="141" top="499" left="86">4</char>
        <char bottom="551" right="198" top="497" left="173">3</char>
        <char bottom="546" right="295" top="494" left="243" suspicious="true">7</char>
        <char bottom="551" right="374" top="501" left="336">3</char>
        <char bottom="545" right="476" top="502" left="415" suspicious="true">7</char>
        <char bottom="551" right="535" top="503" left="499">4</char>
        <char bottom="539" right="604" top="503" left="577">0</char>
        <char bottom="540" right="685" top="499" left="657">0</char>
        <char bottom="541" right="771" top="508" left="745">0</char>
        <char bottom="540" right="849" top="511" left="818">0</char>
        <char bottom="551" right="944" top="505" left="889" suspicious="true">7</char>
        <char bottom="550" right="1017" top="501" left="978">9</char>
        <char bottom="551" right="1092" top="506" left="1062">6</char>
        <char bottom="551" right="1190" top="511" left="1135">4</char>
        <char bottom="551" right="1254" top="509" left="1225">0</char>
        <char bottom="551" right="1344" top="507" left="1299">5</char>
    </line>
</text>
link

answered yesterday

Oksana%20Serdyuk's gravatar image

Oksana Serdyuk ♦♦
1.4k16

edited yesterday

Hi Oksana Serdyuk.Thanks for your help! I will work with team to try to improve image quality and fields setting.

(6 hours ago) Dang Vinh
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×157
×37
×4
×1

Asked: 23 Dec '16, 07:50

Seen: 146 times

Last updated: 6 hours ago

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal