ProcessTextField API is not working properly

  • 1.5K Views
  • Last Post 23 April 2014
shyam_blr posted this 18 April 2014

I am trying to get data for a respective field (say Discount) from a sample tiff image(i.e. Picture_010.tif, which is a part of Picture_samples, provided by ABBYY http://ocrsdk.com/help/picture_samples.zip ).

Its the URL send to the server: http://cloud.ocrsdk.com/processTextField?language=English&textType=normal,handprinted&oneTextLine=true&regExp=Discount

The output shown as, which doesn't have the intended output. I went through all the previous post on this tag, but didn't get much help from that:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<document xmlns="@link" xmlns:xsi="@link" xsi:schemaLocation="@link" version="1.0">
  <field left="0" top="0" right="2500" bottom="3521" type="text">
    <value encoding="utf-16">ts^</value>
    <line left="216" top="376" right="2104" bottom="1560">
      <char left="216" top="744" right="584" bottom="1544" confidence="12" suspicious="true">t</char>
      <char left="584" top="728" right="1016" bottom="1560" confidence="13" suspicious="true">s</char>
      <char left="1016" top="376" right="2104" bottom="1560" confidence="-1" suspicious="true">?</char>
    </line>
  </field>
</document>

I am running the java sample for ProcessTextField as below :

java TestApp textField --options="regExp=Discount" "C:\\Picture_samples\\English\\Scanned_documents\\Picture_010.tif" "C:\\Picture_010_1.xml"

Looking forward for your help to resolve this issue.

Order By: Standard | Newest | Votes
Anastasia Galimova posted this 19 April 2014

"regExp"=Discount is a wrong parameter. Regular expressions is not suitable for this task, you can find the details about regExp here.

You can recognize Discount field using the URL like this:

http://cloud.ocrsdk.com/processTextField?language=English&textType=normal&region=left,top,right,bottom

where left, top, right, bottom are the text coordinates. They are measured in pixels relative to the left top corner.

If you don't want to specify the text coordinates directly, you can get all text with its coordinates (using the processImage method, export to XML). Then you can extract the necessary information on your side, as you know that the numbers you need have almost the same vertical coordinates as the "Discount" word.

shyam_blr posted this 20 April 2014

I tried your suggestion in the URL, but see HTTP 500 (Internal Server error) without any further information. I tried the following couple of URL (with/without regExp) for Discount field.

http://cloud.ocrsdk.com/processTextField?language=English&textType=normal&region=left,top,right,bottom

http://cloud.ocrsdk.com/processTextField?language=English&textType=normal&region=left,top,right,bottom&regExp=D[a-z]+

shyam_blr posted this 20 April 2014

If i want to get the data for a particular field (say Discount) in the document with the region selected as (left,top,right,bottom), can i also provide the the field name (like Discount) for recognition in any other parameter of your API or regExp is the only way ?

Anastasia Galimova posted this 21 April 2014

RegExp is not suitable for this task in general. It is suitable for cases if you want, for example, to recognize text "olo 123", and for some reason it is recognized as "010 123".

Unfortunately, we have not got a parameter which allows you to recognize a field with a specified word (like "Discount") near it. You can only do it on your side (use the processImage method, perform export to XML, extract the necessary information using text and its coordinates).

The URL should looks like

http://cloud.ocrsdk.com/processTextField?language=English&textType=normal&region=0,0,200,200

(where 0,0,200,200 are the text coordinates). Could you please clarify if the error occurs with this URL?

shyam_blr posted this 22 April 2014

Here is the XML output for this URL : http://cloud.ocrsdk.com/processTextField?language=English&textType=normal&region=0,0,200,200

<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="@link" xmlns:xsi="@link" xsi:schemaLocation="@link" version="1.0">
    <field left="0" top="0" right="200" bottom="200" type="text">
        <value encoding="utf-16" />
        <line left="0" top="0" right="0" bottom="0" />
    </field>
</document>

Anastasia Galimova posted this 23 April 2014

So, the error does not occur. Note that you should specify your own text coordinates (0,0,200,200 are just an example). For the Discount field on Picture_10.tif you can specify 1473,1992,1628,2029.

Close