processTextField ignoring letterset and regexp params

  • Last Post 18 March 2014
HankLloydRight posted this 11 March 2014


I'm just getting started with OCR SDK, and I'm using PHP to send POST method with an image of a serial number, and I'm testing out letterSet and despite setting it as follows:*

I still get letters returned in the XML result set. I've also been testing the RegExp parameter, and that also seems to be ignored (returning letters where only numbers are specified in the RegExp). In this case, I am expanding letterSet to include all letters and numbers and adding this regexp parameter:


What I am trying to do is have OCR recognize a serial number in the format: A?ANNNN (A=letter, N=number) where only digits can appear in positions 3-6, and only a one or two letter prefix (A-Z).

I assume that the parameters for processTextField are sent in the URL string (GET) as opposed to sending with the POST along with the image?

I did see the post about using the "Digits" language, but my requirements are more than what is contained in that language.


Order By: Standard | Newest | Votes
Anastasia Galimova posted this 11 March 2014

To let us test it, could you please share or sent to CloudOCRSDK@abbyy,com the image you recognize?

HankLloydRight posted this 12 March 2014

I sent a detailed message to that email address. thanks.

Anastasia Galimova posted this 18 March 2014

Thank you. We have received your letter and will reply tomorrow.

Anastasia Galimova posted this 18 March 2014

The issue occurs because OCR technologies are not trained well for this font. It should be fixed in the future.

We find our that both of your images could be completely recognized with the following URL:

Thank you for your patience!

HankLloydRight posted this 18 March 2014

Thanks for your reply.

I had tried "handprinted" as well as all the other textType types during testing, but handprinted failed on many more of the other images I tested.

I found that using "textType=normal,typewriter" generated the smallest number of OCR errors for my images. Really, the only one image that failed with "textType=normal,typewriter" was the one I emailed you.

Can you explain how the RegExp parameter works, since Abbyy still returns values that would not pass the RegExp I'm using.

In the mean time, I'll just write some code on my end to detect the mis-reads that violate the RegExp values, and try to correct them before passing to my application.

Thanks again.

Anastasia Galimova posted this 18 March 2014

We have found two bugs, that should be fixed in the nearest feature and could be avoided now:

  1. Regular expression does not works when the language is specified directly. We recommend do not specify the language in the URL (letterset and regExp are enough).

  2. It is something wrong with asterisk in the letterset: when it is used with handprinted text type, an error occurs. If all of your expressions contains an asterisk in the end, probably you can recognize only the text before it.

Anastasia Galimova posted this 18 March 2014

Also the syntax you use is slightly different from described in the manual .

For this text



  • A=letters A thru M
  • F=letters A thru L
  • B=letters A thru Z (excluding letters "O" and "Z")
  • N=digits 0-9

you can use, for example, this regExp: