I am trying to get data for a respective field (say Discount) from a sample tiff image(i.e. Picture_010.tif, which is a part of Picture_samples, provided by ABBYY http://ocrsdk.com/help/picture_samples.zip ).

Its the URL send to the server: http://cloud.ocrsdk.com/processTextField?language=English&textType=normal,handprinted&oneTextLine=true&regExp=Discount

The output shown as, which doesn't have the intended output. I went through all the previous post on this tag, but didn't get much help from that:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<document xmlns="@link" xmlns:xsi="@link" xsi:schemaLocation="@link" version="1.0">
  <field left="0" top="0" right="2500" bottom="3521" type="text">
    <value encoding="utf-16">ts^</value>
    <line left="216" top="376" right="2104" bottom="1560">
      <char left="216" top="744" right="584" bottom="1544" confidence="12" suspicious="true">t</char>
      <char left="584" top="728" right="1016" bottom="1560" confidence="13" suspicious="true">s</char>
      <char left="1016" top="376" right="2104" bottom="1560" confidence="-1" suspicious="true">?</char>
    </line>
  </field>
</document>

I am running the java sample for ProcessTextField as below :

java TestApp textField --options="regExp=Discount" "C:\\Picture_samples\\English\\Scanned_documents\\Picture_010.tif" "C:\\Picture_010_1.xml"

Looking forward for your help to resolve this issue.

asked 18 Apr '14, 16:57

shyam_blr's gravatar image

shyam_blr
113

edited 19 Apr '14, 19:37

Anastasia%20Galimova's gravatar image

Anastasia Ga... ♦♦
790112


"regExp"=Discount is a wrong parameter. Regular expressions is not suitable for this task, you can find the details about regExp here.

You can recognize Discount field using the URL like this:

http://cloud.ocrsdk.com/processTextField?language=English&textType=normal&region=left,top,right,bottom

where left, top, right, bottom are the text coordinates. They are measured in pixels relative to the left top corner.

If you don't want to specify the text coordinates directly, you can get all text with its coordinates (using the processImage method, export to XML). Then you can extract the necessary information on your side, as you know that the numbers you need have almost the same vertical coordinates as the "Discount" word.

link

answered 19 Apr '14, 19:53

Anastasia%20Galimova's gravatar image

Anastasia Ga... ♦♦
790112

I tried your suggestion in the URL, but see HTTP 500 (Internal Server error) without any further information. I tried the following couple of URL (with/without regExp) for Discount field.

http://cloud.ocrsdk.com/processTextField?language=English&textType=normal&region=left,top,right,bottom

http://cloud.ocrsdk.com/processTextField?language=English&textType=normal&region=left,top,right,bottom&regExp=D[a-z]+

(20 Apr '14, 07:06) shyam_blr

If i want to get the data for a particular field (say Discount) in the document with the region selected as (left,top,right,bottom), can i also provide the the field name (like Discount) for recognition in any other parameter of your API or regExp is the only way ?

(20 Apr '14, 07:09) shyam_blr

RegExp is not suitable for this task in general. It is suitable for cases if you want, for example, to recognize text "olo 123", and for some reason it is recognized as "010 123".

Unfortunately, we have not got a parameter which allows you to recognize a field with a specified word (like "Discount") near it. You can only do it on your side (use the processImage method, perform export to XML, extract the necessary information using text and its coordinates).

The URL should looks like

http://cloud.ocrsdk.com/processTextField?language=English&textType=normal&region=0,0,200,200

(where 0,0,200,200 are the text coordinates). Could you please clarify if the error occurs with this URL?

(21 Apr '14, 18:05) Anastasia Ga... ♦♦

Here is the XML output for this URL : http://cloud.ocrsdk.com/processTextField?language=English&textType=normal&region=0,0,200,200

<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="@link" xmlns:xsi="@link" xsi:schemaLocation="@link" version="1.0">
    <field left="0" top="0" right="200" bottom="200" type="text">
        <value encoding="utf-16" />
        <line left="0" top="0" right="0" bottom="0" />
    </field>
</document>
(22 Apr '14, 05:21) shyam_blr

So, the error does not occur. Note that you should specify your own text coordinates (0,0,200,200 are just an example). For the Discount field on Picture_10.tif you can specify 1473,1992,1628,2029.

(23 Apr '14, 14:24) Anastasia Ga... ♦♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×24

Asked: 18 Apr '14, 16:57

Seen: 1,456 times

Last updated: 23 Apr '14, 14:25

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal